Get Your Database Under Version Control

TechZeus · February 4, 2008, 12:00am

This post has been added to TechZeus!

Ivn · February 4, 2008, 12:00am

You have projects like Apache DDLUtils [1] to work with your database in XML files.

Regards,

Ivn

James · February 4, 2008, 12:00am

the problem i’ve always had with db schema source control though is that application code can generally be rolled back to a prior version without issue, but db schema changes have to be undone, and i’m not aware of any automated way of generating the script to undo an arbitrary set of schema changes.

for example - how would you roll back a data normalization update, without a manual script, or loss of the data changes since the update?

Monkeyget · February 4, 2008, 12:00am

Martin Fowler made an interesting article on the subject. Evolutionary Database Design : http://martinfowler.com/articles/evodb.html

maeghith · February 4, 2008, 12:00am

Damn,… I mean,… I was doing that but I didn’t even realize about it, it just felt the natural way to do it. Or just How could you not be doing it?

But then again, if I got to work in an existing environment that doesn’t do it, I’m not sure if I will realize it’s missing (well, I’ll now pay a little more attention to this).

Great remainder.

NedB · February 4, 2008, 12:00am

What I’d like to know is, how do you best manage a production site where running the upgrades could take tens of minutes, but you don’t want the site offline for that long? How do large-scale sites manage their upgrades to avoid down time? Or do they simply accept that there will be 2am-4am windows where the site is planned to be down?

SameerA · February 4, 2008, 12:00am

Hi Ned. This is an idea. Upgrade a copy of the site/server, and re-point to the new one once its ready. Of course from the point of migration, any live data would also need to be “upgraded” which may or may not be possible.

asm5 · February 4, 2008, 12:00am

I work on a database that is embedded in a product, so it is probably smaller than most of your examples. The database was not baselined when I started. Someone would take an existing database delete the records make the changes and check the DB files into source control. There was no way to see what changed.

We are using Sybase SQL Server. It has a DB unload command that writes out a SQL script. I used that tool to create the baseline and put the script into source control. I found that each time you run this command the output is in a different order. The process I follow today is to prototype the changes in an instance of the DB and make the changes manually in the script. I test and when it is ready I checked in the changed script. I can then pull up a version tree and do differences of versions to see what changed.

We are working on a new version of the database code plus schema. In this new version the SQL script is the only thing checked in. When the product starts, if there is no database files it creates them with dbinit and then runs the script to create the database. We also have upgrade scripts so that we can update existing instances at customer sites.

As was suggested in some of the earlier posts, one of the first things I did after getting the new database running is to create a tool that generates test data. This is used by my Nunit tests that tests the application and the database schema. The unit tests round trip the data and compare what goes in with what comes out. There are also tests for the database triggers. This gives me the confidence to make any changes in the code that we want, to refactor, and to take patches from the database vendor.

RobbyS · February 4, 2008, 12:00am

Come on, Jeff! Version control systems cannot be used to manage database schemas without some serious fudging because they are short one dimension. As James points out above, you can roll back to previous versions of code, but how do you roll back to previous states of database scheme and data?

The easiest solution I know of is to maintain a seperate folder with a set of pairs of database scripts, whose names correspond to the version number in your SCM system. As part of the build process, every script between the current version and target build gets executed: forward scripts for new code, and reverse scripts for rolling back to previous versions.

Yes, this is a more work: you have to create a reversed version of every database patch you make that unALTERS TABLEs, but at least you can roll back to previous system states easily. And yes, you will lose data if you roll back to a version with with a poorer data model, but what would you expect to happen?

DavidC · February 4, 2008, 12:00am

I wrote up some details on how we approached this here:
http://www.uberconcept.com/articles/SQLServerVersionControl.htm

Joe_Beam · February 4, 2008, 12:00am

duh.

lucky for me the only version I care about is the most recent. All I need to keep are my create scripts that I back up at the end of every day.

pyalot · February 4, 2008, 12:00am

What’s all the fuzz about? I use sqlalchemy models and sqlalchemy migrate to manage my database and write migrators, it’s all under source control…

Masklinn · February 4, 2008, 12:00am

@Robby Slaughter
As James points out above, you can roll back to previous versions of code, but how do you roll back to previous states of database scheme and data?

Rails does that by integrating upgrade and rollback in each migration.

Basically, a migration has a method “up” and a method “down”, when you migrate forward “up” is invoked and alters the schema in a given way (adding columns, normalizing, filling values, …) and “down” is supposed to do the opposite and downgrade the database.

It doesn’t work if the migration creator doesn’t fill the down method of course, but as long as you do it should be possible to roll the db forward and backward.

Denis · February 4, 2008, 12:00am

This might work for a small or medium size database. My database is well over a terabyte, how would you keep that data under version control? Data gets modified every day. Some of the data is encrypted. Some of the data gets inserted into audit tables. Data gets replicated to other servers/databases. There are jobs that pull in data from real time systems every second. I do have different versions of DBs on staging and QA servers but only one on the production server. Changes have to go through change management, you have to open a ticket to do a change. This is not something you would do one two three

Frank · February 4, 2008, 12:00am

Jeff, I saw your first post on DB version control. My problem is that I haven’t the first clue HOW to do this properly. I work with a dev version of my DB when I’ve developing, so the data itself isn’t accurate. Plus how can you version control the data, SQL dumps? Those would be BIG files, cumbersome, etc.

I could export the schema as SQL commands, but then how do I import this schema to my live DB, with existing data? I’d have to rewrite the schema output to update the existing tables. That’s even more work than simply applying my changes when it comes time to go live.

Due to the hurdles above, I find it perfectly acceptable to make constant backups, and carefully apply DB changes when it comes time to go live. If you have some suggestions, I’m all ears.

Waz · February 4, 2008, 12:00am

The chocolates are from Trader Joes (I’m guessing). They’re as good as they look. http://traderjoes.com/

Guido · February 4, 2008, 12:00am

Funny, year’s ago I wrote an set of ant tasks that would bring the database up to the current version from ANY previous version. If multiple developers would each develop on their local installation, after a update they would get all the changes, including migration of data, including the creation of backups of the table, adding columns, migrating columns, etc. etc. This has worked perfectly for years. These same builds where used to upgraded the production databases too.

But at that moment I really didn’t think anything of it, just common sense. Great was my surprise that I met 1) people (developers) who thought this was TOTALLY awesome, 2) people (developers) from whom I would just get blank stares and who couldn’t for the love of god figure out WHY I would go through all this trouble.

PS Before you ask, i cannot share the tasks since they depend for a major part on the meta-data of our proprietary application. Otoh, it’s really not rocket science.

Catto · February 4, 2008, 12:00am

Hey Now Jeff,
From now on when we eat peanut butter cups I’m going to link of DB’s.
Coding Horror Fan,
Catto

RobertR · February 4, 2008, 12:00am

What is version control? Seriously though, I still have not gotten the hang of using SubVersion and we don’t use any version control at work. SQL Server does not offer any built-in support for version control. I just generate data dictionaries and SQL create scripts every once in awhile.

Mike_Hadlow · February 4, 2008, 12:00am

Hi Jeff,

I think I posted about this before in your comments, but since you’re talking about DB versioning again here goes:

I blogged about this a while back:
http://mikehadlow.blogspot.com/2006/09/how-to-do-database-source-control-and.html

Secret Geek had quite a popular post on the same subject with good comments:
http://www.secretgeek.net/dbcontrol.asp

I had a great experience working with the Db Ghost guys on a huge project which they helped to save. Check them out here:
http://www.innovartis.co.uk/