Get Your Database Under Version Control

A little over a year ago, I wrote about the importance of version control for databases.

This is a companion discussion topic for the original blog entry at:

Thanks!!, Database Under Version Control is great Idea

I just think the best strategy is to make your application create and update the database itself. So all the DDL code is a PART of your application and therefore it is automatically version controlled.

Once you have used ActiveRecord Migrations it is very hard to go back!

I absolutely agree. I see three things you can do to avoid any problems with database inconsistencies:

  1. Keep your SQL scripts as part of the source code under the version control
  2. Using those scripts automatically recreate the database as part of the build in order to…
  3. Have integration tests that test the DAL code and run on the actual database.

Do you suggest to put the data under version control or the schema only?

I’m just printing your posts and casually leaving them at my bosses table. Keeping my fingers crossed :slight_smile:


Once you have used ActiveRecord Migrations it is very hard to go back!

AR’s Migrations have a problem scaling with the number of devs though, since the intrisic linear versioning model of migrations is, well, braindead (it’s a centralized model for a distributed reality, basically), as soon as you have 2 or 3 devs creating migrations it breaks down very quickly.

That’s, I think, the worst problem about migrations: they’re really cool, as long as you have only one or two persons creating them, or you ensure that not two persons will create a migration at the same time.

mmmm delcious

His approach to change scripts is naive. We have a lot of problems in this area. We have nearly 60 developers working on an application. With a lot of schema churn. Big problems.

Quote: “You deploy the app, and you deploy the database. Like peanut butter and chocolate, they are two great tastes that taste great together.”

I’m allergic to peanuts.

Do you suggest to put the data under version control or the schema only?

The part of your data that represents constant or configuration data should definitely be under version control as well. Every time you deploy the DB, anything that must be there for the app to run out of the box is a candidate for scripting and versioning. As for the variable part of your data, have a synthetic data generation plan for testing… ideally.

I’m allergic to peanuts.

I invoke poetic license!

like duh …

I mean how else would you upgrade from one version to the next? Sql is completely scriptable, so there’s no excuse.

@anonymous - No, just schema changes, stored procedures, views etc. The data doesn’t belong to you. You would also script data changes and new data if applicable (ie new reference tables, data migration, etc)

We also script all stored procedures and views, and they’re recreated as part of a deployment script. Just like code, if it’s not under source control, it’s not guaranteed to work. Let the dba’s and project owners know this, that way they don’t add or change things without your knowledge.

Another good trick is getting a daily restore of production to dev, and as a part of your daily build, run the upgrade script against it. If it fails, so does the build.

Ditch the silly UI’s. I mean seriously, it’s pretty easy, and it will save your bacon.

When I ask development teams whether their database is under version control, I usually get blank stares.

As already answered by you above in response to ano nymous:
As someone who used to develop his share of DB based apps, I can understand this stare :wink: For DBAs and DB developers, the database is the whole big thing; ideally packed into one file on the disk, but it can also be a whole dir, a collection of dirs or scattered over several machines. How do you do VC for that, and does it make sense?
But since I use VC (not thaaat long ago…) I made a habit of checking in my DDL scripts along with all SQL scripts to fill in initial (constant) data, test data and migration scripts. That’s the stuff that profits from VC (think of comparing your DDL to get the schema changes you made and forgot to document). So you should be more precise with that question, like “is your schema under version control”…

Thanks for the links! I have some create scripts in my solution, but the versioning really needs to have a way to make changes, account for loss of data, deployment on database with data etc. It’s tough and I don’t know how to do that, I wish I had Ruby on Rails’ migrations.

We do something similar to ruby migrations, basically creating a separate update script for each individual change to the DB. The database has a table which keeps a record of what update scripts it has run. Then we run a simple console program to that updates the db to the latest version.

This is run in a post compilation phase on the developer machines so that each developers database is up-to-date with the latest committed changes. This is also run during continuous integration to setup the database for testing.

Works quite well for us. We are missing rollback scripts but that hasn’t been a problem for us, and it would be easy to start doing them if we ever needed them.

we do a change to db, then script it and include it in a script file like 4.0.8DB, 4.0.9DB. if u want to upgrade an existing database u run the script. these scripts are run on production databases so we need to be careful…no dropping tables and recreating them. we use red-gate in the pre-packaging phase to make sure the script doesn’t introduce discrepancies.

we keep backups of all previous versions for the occasion we need to patch prior versions.

the db data is also backed-up in the test environment so developers can quickly do a restore to replicate the screen on which any error has occurred.

yes we also use CruiseControl

We version control our database schema. We do this as follows:

  1. We use Hibernate, so we can generate a full database schema from our Hibernate mappings.

  2. Whenever we make a change to our Hibernate mappings, we regenerate the full database schema.

  3. We then diff it against the previous version of the generated schema, and we use the diff to create a schema upgrade script.

  4. We then check in the new generated schema and the upgrade script in a version-numbered directory (0059, 0060, 0061, etc.).

  5. These directories have to be in a linear order, because there has to be a well-defined sequence in which they are applied. Sometimes we do get overlapping schema changes (e.g. someone on a release branch has to make a schema change for a high-priority bug), in which case we use a decimal point (e.g. 0059.1) for that schema change.

  6. We can then take production database snapshots, download them to our QA environment (data and all), and upgrade the QA database with the schema upgrade scripts. This is how we QA the schema upgrades before we do the production release.

  7. We only version-control the schema, not the data.

  8. We do need to have occasional downtimes, but we’re working on reducing the length of those. We’ve mostly been able to avoid large, expensive, touch-millions-of-rows table updates.


It seems it’s a perfect time for some shameless self promotion. I wrote simple tool exactly for this task some time ago and used it for more than a year.

I now have ~80 schema revisions under control and it works just fine.

It’s somewhat primitive compared to Deseb or Migrations, but anyway I find it very useful.