Warning: rant on database (in)dependence ahead.
Over the past decade or so I’ve worked with several RDBMSs: Oracle, Postgres, MS SQL, DB2, Sybase, and a little MySQL (wasn’t a fan). If there’s one thing I’ve learned is that each database has a fundamentally different architecture, and database independence is NOT easy to achieve. I’m not even sure if it’s desirable.
Switching databases has always been a big, big deal in my experience. This coming from a guy who started on the completely opposite end of the spectrum. I did everything generically - this is just the perspective I was taught in college and literature (black boxing and all that). The first way I ever used a database was by querying MS-SQL doing “select * from mytable” and iterating through the results. I realized that was stupid and learned this whole SQL thing. I learned about ANSI SQL and how I should try to conform to this standard to achieve independence. Then I realized I was using the lowest common denominator and achieving poor results. And coding became much more difficult - had to check against the ANSI standard before using a particular feature: “Darn, x isn’t ‘ANSI’? Well, I’ll do it the hard way…”
If you just replace one database with another, plug-in style, you’re going to notice differences (perhaps “bugs”), even with the latest and greatest ORM. With the exception of trivial applications, these differences will affect your application, potentially drastically. For example, you have two users running a transaction on the same row(s) of data. Due to fundamental differences in concurrency mechanisms, Sybase might deadlock (and so one user is rolled back and gets an error) while Oracle runs smoothly. Or MS SQL will give you one answer while Postgres literally gives you a different answer to the same query! And yes, I’m talking about setting the same isolation level in your ORM…
If you use DB2, you’re going to have to use repeatable read isolation to get a consistent result set. Not the case with Oracle.
Database A may give you non-blocking reads; Database B does not. Therefore your app plugged into database A may run just dandy. You switch to B and your users complain of slowness. Maybe you then conclude that Database B is bad, but really the problem is that you perhaps unwittingly designed for Database A, thinking that you were database independent all along.
Many developers (not all) want to treat the database as a black box - they feel they don’t need to know anything about it - they believe they should AVOID knowing anything about it in order to be good object-oriented programmers. The culture supports this notion with terminology such as “data store” and “persistence layer” - just a place to plop down your data and take it out again later, as if it were spreadsheet. In reality, most RDBMSs are very robust with countless features to manage, manipulate, safeguard, and serve your data to many concurrent users.
Why not use the features that make your database special? If you’re using a commerical database, you paid a lot of money for these features. Why use the lowest common denominator? Using Oracle’s analytic queries I’ve seen queries run literally orders of magnitude faster than lowest-common denominator queries/code. I’ve seen an overnight job transformed into a sub-minute job by using Postgres’s native SQL that lets you do, through a single query (not row-by-row): “If a row having the matching ID exists, update, else insert”.
I find it much easier to rewrite the database API (stored procedures) entirely than to deal with the above issues. I’ve used ORM in the form of Hibernate, in order to solve the database independence problem (wasn’t my choice), and I found the cure was worse than the disease - I’d rather translate my 100 PL/SQL functions to PL/PgSQL (maybe a bad example as they’re nearly identical - or maybe a good example as to why it’s not always hard - actually both databases support Java stored procedures so we could even go that route…) than deal with 50 Hibernate XML files. And guess what - even if using Hibernate were easier for me (I realize it’s an opinion), it never solved the aforementioned problems.
So I typically push for database dependence, aka “application independence” or “using the database to its fullest potential”, not database independence. At one job I was told by my supervisor, “all database access must be done through the business objects in order to achieve database independence.” He wasted his time writing his own (buggy) security. Database tables lacked all but the most basic constraints, and contained strings of pipe-delimited name-value pairs rather than using native database types. I asked what we will do if we want to hook up Crystal Reports or MS Access or even (gasp) run an ad hoc query from the command line app? Not allowed, I was told! He wanted to be database independent so much that he sacrificed application independence. He was saving his spreadsheet. Thankfully, most perspectives are not quite as extreme…
Think of how many application technologies/languages/paradigms have come and gone. What do you think is important to companies now: Their 12 year old web apps (remember when “CGI/Perl dynamic web pages” was the hot thing to put on your resume?) vs their RDBMS. How about their 10 year old VB/MS Access/C++ Client/server apps vs. their RDBMS? 20 year old green screen apps vs. their RDBMS? I’ve developed web apps using JSP/Servlets 3 to 5 years ago - I’ve been almost laughed at for not taking advantage of EJBs. The author of this Hibernate book claims most EJB projects failed and ORM is the wave of the future. You gotta keep up with the times - i.e., scrap the all-exalted paradigm of 5 years ago for the new godsend. So… your app vs. your database - which are you more likely to redesign/throw out within 10 years and which are you going reuse/depend on?
As always, just my opinion. I apologize if the above post isn’t the best organized…
Angva