Get Your Database Under Version Control

I think its an good idea to version control the database scheme and basic lookup data. That way the database stays under control. Of course then the version control process needs to be very smooth so that you don’t have to curse the efforts.

@James: You could rollback database by installing major release and patching all the change scripts to the minor release you were in.

@Robby Slaughter: If you revert the database scheme, of course you loose some data, because the new data is not compatible with the old scheme. But lookup data and such is part of the database version, so you don’t loose everything. And if you have the reverted patch script left, you can try to extract as much of the reverted data as possible of it.

If you revert some random patch from the version control stack, the database might end up messy. But so could end up messy program code too, if you don’t know what you are doing.

If you use some test data for the database application, you have the test files version controlled too. You should not release a new version of the application before all the tests are upgraded to support the new version and are showing green lights in the unit tests. Each test case has its own test data. Before each test case, the test script wipes out all the data and inserts new test data into the database.

“His approach to change scripts is naive. We have a lot of problems in this area. We have nearly 60 developers working on an application. With a lot of schema churn. Big problems.”

Are they not making source code changes? Have you not figured out how to handle this? They are the same thing.

“As James points out above, you can roll back to previous versions of code, but how do you roll back to previous states of database scheme and data?”

Why would you need to roll back? Just rebuild from scratch.

“This might work for a small or medium size database. My database is well over a terabyte, how would you keep that data under version control? Data gets modified every day.”

Nobody is talking about version controlling data

Have you looked at dzo (http://sourceforge.net/projects/dzo), it have solution for some of the problems addressed here. It compares a file with native sql-code (create table …, create view …, grant …) for all database objects in the schema with the actual database-schema and shows (or executes) the needed sql to transform the database schema to the sql in the file. It have support for lookup data (called referencedata in dzo), ie data that the application just reads. Dzo supports at the moment MySql, Oracle and Sql Server.

I’ve been working on a database schema management system for the past six months. All approaches I’ve investigated fell short in one aspect or another. Support for branching is a big concern. Also, none of the solutions I’ve investigated made any distinction between tables and programmables. Concurrent changes are also important.

I’ve written a 2-tier java swing app that manages a repository of database artifacts, by organizing them into projects. The design in inspired by industry standard versioning techniques and is artifact driven.

I’ve made a lot of progress; I am putting the finishing touches on the oracle schema import feature. It will import and provide a base-line for tables, constraints, indexes, views, functions/sps, packages, and sequences… More work remains, certainly.

http://picasaweb.google.com/lh/photo/RmLccBaZbI5_f3fVwCiSNtdGpmyed42E2H64xNBONNA?feat=directlink

Here’s the upgrade console in action:

http://picasaweb.google.com/lh/photo/aAHA0Rfm_M9VhgE032r-pNdGpmyed42E2H64xNBONNA?feat=directlink

I agree with many of the others that commented here. It is amazing that issues with controlling and versioning database schemas still pervade the IT landscape.

There is an interesting new product called DBVS that addresses these issues in a unique way by translating database objects into XML and versioning those files rather than SQL scripts. This methodology also allows DBVS to deploy and rollback changes automatically.

Their website is http://www.dbvsys.com.

If you want to go one step further with version control of your database schemas, branching / merging your developments, automate the incremental delivery generation from version information you can take a look at neXtep designer :
http://www.nextep-softwares.com

It is a free GPL product which currently supports Oracle, MySql and PostgreSql (DB2 is on the way).

It is a complete database IDE based on Eclipse RCP. In the environment, you actually work on an “offline” version control repository which you synchronize with your development database. Once a development is made you can commit it and generate a delivery from version information. A standalone installer will then be able to deploy your delivery and will check your target database structure to validate the deployment.

Among other things, the IDE offers data model diagrams, dependency management, version control, reverse synchronization, modular data models, integrated SQL client, SQL editors, auto-completion, etc.

The product has already been adopted by some leading e-business companies in France. Any feedback would be highly appreciated.

Complete documentation, tutorials, demos, concepts overview could also be found in the wiki :
http://www.nextep-softwares.com/wiki

Christophe

Hi all,

I’m looking for 3 simple things… well, I thought of them as simple, but I’m realizing they’re not to be taken for granted!

  1. a reverse engineering tool that I can point to an Oracle schema and get a “baseline” script to re-create that schema from scratch, with decently formatted DDL files (1 per object) neatly organized in a directory tree (by object type) and called in the correct order. Icing on the cake would be an option to pass the tool a list of tables containing static data and get DMLs to populate (insert) those tables as part of the script.

  2. a diff tool that I can point to a pair of Oracle instances (source and target) containing a given schema and get a “delta” script to alter the target schema to become identical to the source schema. If data loss occurs on the target instance (i.e. drop a column) I would like to find a warning comment inserted in the script (e.g. “-- Attention: data migration DML needed here?”). Icing on the cake would be an option to pass the tool a list of tables containing static data and get DMLs to update (delete, update and insert) the data in the destination tables to become identical to the contents in the source tables without deleting and re-inserting all rows (or dropping and repopulating the table).

  3. I would like the above two tools (that, as you will have recognized, are basic to putting your database design under version control) to be open-source, with a command-line interface and a vibrant community backing them.

I must be one out of a couple million people asking for the same things over and over again: I’ve seen the questions all over the internet but I could find no straight answer. Please help!

Thanks and take care.

Hi John

Check out the tool Lure at http://earthly-software.com. The demos on the website will give you a quick introduction of how this tool works.

I believe it meets all your stated requirements except that it is new and commercial.

There’s a commercial tool called DataStar which is designed to get your static data / configuration data under version control - only supports SQL Server at the moment but integrates with Subversion and TFS. www.data-star.co.uk

If we are talking commercial tools, Databazoo Dev Modeler seems to get the job done rather well. It’s crossplatform and does much more than just database versioning. www.devmodeler.com