Get Your Database Under Version Control

Brad_Harris · February 4, 2008, 12:00am

For anyone in Java land, Liquibase is a really great database migration tool. We’ve been using it to help manage our releases and its very flexible.

http://www.liquibase.org/

JorgeD · February 4, 2008, 12:00am

Good post and good links

Thanks a lot

Regards

CraigB · February 4, 2008, 12:00am

Practically all the applications I work with are enterprise applications where the database is never redeployed; it gets upgraded in place. Tools like Red Gate’s SQL Compare really come in handy for generating update scripts for our next roll-out.

http://www.red-gate.com/products/SQL_Compare/index.htm

Chaos2 · February 4, 2008, 12:00am

Hmmmm, Peanutbutter Chocolate!

MP09IND · February 4, 2008, 12:00am

Database should keep under source code, but if we are working in environment where frequent changes and fixes are in progress, it’s too pain full to maintain this much number of version. And maintaining data along with whole schema is tedious. There should be automated tool set require which can create version automatically after predefine interval or time.

Bill135 · February 4, 2008, 12:00am

WTF does the picture have to do with the article? You’re constantly breaking your own blogging rules/recommendations. Trash.

Tiest_Vilee · February 4, 2008, 12:00am

check out DBDeploy, http://dbdeploy.com/ , for db change script action

Kevin_Wheat · February 4, 2008, 12:00am

My DBA’s see no value in source control the db schema. And their manager out ranks me. We just do compares and generate modify scripts

Jonno · February 4, 2008, 12:00am

@Masklinn

There are plugins to get around the AR MIgrations problem when you have multiple developers.

AnotherA · February 4, 2008, 12:00am

@Denis, others - we’re not talking about version controlling the data.

@Robby … Why in Fords name would you create scripts to ‘uninstall’ your database to a previous version? I just can’t for the life of me figure out why you’d do such a thing! I don’t know where you come from, but NOT losing data is the whole point of having a db in the first place. It begs the question that if you can afford to lose data, it shouldn’t be there in the first place (some exceptions apply) If you need an old version around to test, then save a backup mate!

(just because it’s possible, doesn’t mean you should do it. Ugh…)

For the data portion, you back up the database. Cluster it. Ship logs. Offsite storage. Whatever. This is the dba’s job and they usually have pretty good tools for it.

What we’re talking about here is x developers working on a new release. You NEED to create a predictable and repeatable process in which to upgrade a ‘production’ copy of the database with all of the changes the dev guys have done so that the new code works as expected.

PS Don’t give me crap about ‘with 1 guy it’s too much effort’ … I mean what’s harder than writing a few lines of text in a text file? Now, just version control the upgrade file. Done. Easy.

Bonus: you can blame people because you can see their mistakes by doing diff’s on the upgrade script.

– adding newCol for xyz, requirement 1.2.33a
ALTER TABLE dbo.MyTable ADD newCol INT NOT NULL REFERENCES … etc.

Finally, I think it’s pretty safe to ignote baselining, I think this in only applicable to some developers based on the type of product. i.e. people who sell shrinkwrap software. For your in-house/large site - everything is just a change script. There’s usually only ever 1 copy of the app in production, and only 1 database. Don’t waste your time until somebody really needs a pristine database.

Louis · February 5, 2008, 12:00am

The main structure of the post goes like this: “Get your (ambiguously defined) database under version control or I’ll drop ad hominems on you; oh, and for supporting evidence leave this blog”. Not Jeff’s best work.

Mike · February 5, 2008, 12:00am

You’re so spot on, except for the peanut/butter combination. But I guess that one is personal

dilbert · February 5, 2008, 12:00am

I agree with others this post was trash.

MikeM · February 5, 2008, 12:00am

To support simple schema changes I started an Open Source project called Crank http://codeplex.com/Crank

It’s still in it’s infancy but it’s a sort of Declarative Short-Hand for Transact-SQL.

Any feedback is welcome

James · February 5, 2008, 12:00am

@Anon : the reason i bring up the concept of a different script for patching backwards vs. forwards, and its interaction with the data is not because the data is important per se, but because in my experience there are times when the test database doesn’t react the same way from a performance perspective as the production database, so you may find yourself needing to roll back a schema update as part of the re-engineering. also, when i work with a test database, i’ll typically work with a subset of live data, because a developer’s test data is rarely a match for samples of real data when it comes to looking for problems in your code, and in my most recent project that sample ran around 600-800mb – difficult to keep a backup of this set everytime i make a schema change, and very slow to perform a full restore every time i need to roll back a schema change. you could argue that it would be easier to keep transaction log backups and just do a point-in-time restore, but that would get all data modifications involved in the rollback, like the full-restore concept.

the Rails migration method referenced by Masklinn sounds like a great solution; it’s still manual, but it sounds like it automates about as well as it needs to. the goal as i understand it, is to be able to without needing to think about the details, be able to revert and roll forward through versions of your code, and be sure that all code associated with that revision number is internally self-consistent. the bonus is to have that rolling forward and backward be data-agnostic, i.e. it’ll work around whatever’s there. that way, it should also be safe to run on a production server without fear of data loss. fyi, the production database that shaped most of my opinions here was 85-100gb. backups and restores of any sort were out of the question, and schema modifications had to touch as little as possible.

@Don - “If you revert the database scheme, of course you loose some data, because the new data is not compatible with the old scheme.” – i don’t know that i agree with that. if your schema change for a given revision requires new data, then yes, rolling back that change should lose data, and attempting to subsequently roll that forward again would leave some holes. but many changes are for reorganization or optimization purposes, and those (IMO) could/should be treated like any other refactoring - if your tools are up to the task, you can make some pretty sweeping changes with confidence. heck, if i could just hit CTRL-Z or CTRL-Y to undo/redo updates across my entire application (even if it wouldn’t let me scroll through changes that required data modification), i’d be Speedy Gonzalez the Developer.

@Don - “You could rollback database by installing major release and patching all the change scripts to the minor release you were in.” – This would be a pretty cool idea, especially if the ‘rollback’ script was just the “roll back to the most recent update that would cause data loss”, and then roll forward from there. it might be easier to automate an undo script against a fixed point of reference (from a schema perspective).

@Anon - “Why would you need to roll back? Just rebuild from scratch.” – this works great for a development database, ignoring any of the quality-of-test-data concerns from above. i’ve run into downtime issues with large production database modifications - doing anything to a high-availability db over maybe 30gb is tricky, and i have a hunch that most databases that size tend to be high-availability, simply because that concentration of data implies more frequent usage of that data.

ramesh_k · February 5, 2008, 12:00am

sir, im a php programmer, iam recently joined php **company with fresher. i need some database how to cooneect/ select/ how to write php code with links, how to retrive data from database dynamically? plece send message me sir. my E-mail: raki_ki20002yahoo.com

RobbyS · February 6, 2008, 12:00am

Anybody annoyed by the weird threading limitations of this (and basically all) forums? Anyway…

@AnotherAnonDev: It’s weird, but the truth is if you want be able to roll back the code to any previous version, you have to be able to roll back to the corresponding database schema. The schema and the code are joined at each version. The best way to do this is to be have scripts that unALTER tables.

@Masklinn: Yup, Rails is neato.

The problem is that version control only has one dimension: time. (Okay, there’s branching, but really you just go forward and back). You can pluck out any version of your source code from any point in history, and it will work as well now as it did then.

But, that version of the source code won’t necessarily work with the current version of the database. And if you want to preserve the data in the database when rolling back, you have to go through and carefully reverse all of the changes between the versions. Most SCM systems don’t support this natively.

Jeff, I think you ought to go into a little more detail about putting the database in source control. It’s not as easy as putting the source code in source control. It’s not a simple controversy like putting the documentation or the images in source control. It’s an entirely different level of problem.

Nick_W · February 8, 2008, 12:00am

Let me chime in with support for everyone who has pointed out that DB rollbacks to a previous version are an absolute must for large enterprise systems. There’s no way someone with a 500gb+ database is just going to accept “wipe it and reload” as a fallback option if an enterprise application upgrade goes pear shaped. I made the mistake of suggesting that to an ops team handling a multimillion dollar enterprise app when I was a young DBA, still wet behind the ears. Suffice to say that I had that idea beat out of me in post haste.

Speaking from hard experience, a version controlled DB schema must consist of the following at minimum:

A script immediately available from source control to build a blank database for any given major/minor release. Either the DDL script itself or an equally available sister script must also load the basic fact data required for the app to run. The theory here is that anyone can arbitrarily pick a version of the DB schema that matches up with the version of the app they need to bugfix/test/demo and quickly install it to a working state.
A script immediately available from source control to upgrade an existing database at version n to version n+1. If version n+1 introduces new columns that didn’t exist in version n, reasonable defaults must be populated if the app can’t cope with NULLs. Such scripts should be able to be chained together also, so a user at version n who wishes to go to n+3 can run three upgrade scripts in a row and arrive at n+3 in a functional state.
A script immediately available from source control to downgrade an existing database at version n to version n-1. If version n-1 removes columns or whole tables that version n introduced, data must be both preserved in other schemas/tables that won’t affect the app n-1, and then the new data should be removed or adjusted to be what app n-1 needs to function.

The problem with all this is that I’ve yet to encounter a tool that really does all three of things with any degree of reliability. Some DB specific tools are floating around out there, but for someone like myself who supports large applications that can be backed by one of many possible db architectures (Oracle/DB2/etc) this is all a tedious manual error-prone PITA.

DavidS · February 11, 2008, 12:00am

Our team has the database in version control, and it has given us a huge leap in confidence when delivering a release. A branch of the database exists for each branch of the code.

We’ve been using DBGhost from Innovartis. It lets us make a database package that will upgrade the database from one version to another. Static data, mostly lookup tables, and pre- and post-migration scripts can be included.

The automation toolkit for DBGhost allows us to include the database in our continuous integration builds. Every time a database change is checked in, a test database is created, and all database code is verified.

We’ve run into the occasional automation toolkit bug, all promptly fixed, but I can’t recommend this tool enough. We’ve come a long way from all the developers working with a shared, non-source-controlled development database.

Anonymous · February 11, 2008, 12:00am

hahahaha, we used to constantly have problems with programs and database structure, everyone would always disavow knowledge about why stuff changed. Now we have version control, and whenever anything gets messed up, the same people just blame the version control software. Never underestimate the ability of people to adapt their incompetence to any situation.