Object-Relational Mapping is the Vietnam of Computer Science

I had an opportunity to meet Ted Neward at TechEd this year. Ted, among other things, famously coined the phrase "Object-Relational mapping is the Vietnam of our industry" in late 2004.


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2006/06/object-relational-mapping-is-the-vietnam-of-computer-science.html

I’m not sure why everyone keeps trying to cram everything into tables. I guess it’s one of those “if you have sql then everything starts to look like a set” sindrome…
I’ve been lucky enough to work on projects where I got to work on both sides of ORM issue. This meant I got to fix the tables, views, procedures or the code depending on which part got me more longterm profit. I can’t imagine having to build either part around an existing other.
I’m currently researching db4o and find it very useful. It’s got all I need for my next project. The project has a nice tutorial which you can use to quicky scan through db40 features. The hardest part for me is letting go of some relational axioms when designing the system. Like keep trying to use Id’s to store relations between objects.

Is this to say that Java/.NET does not have good ORM solution? :wink:
And what is “good”?

There is a horrible cliche of people saying stuff along the lines of “blah blah blah Ruby solves this” or “blah blah blah Rails blah” in response to almost any technical blog post.

But in all honesty ActiveRecord, the ORM part of Rails, is really good at what it does, partly because Ruby is a scripting language so it doesn’t have to know the structure of the database ahead of time. It even lets you inherit your relational-objects in a slightly scruffy but very effective way. I’ve been very impressed in all my dealings with it.

Like most of the things about Ruby it’s not really doing anything hugely clever, just simple things with great attention to detail that leave you thinking “why wasn’t everyone doing this already?”

Have you guys even looked at DLINQ yet? Problem solved.

So should I make another “oo sucks” t-shirt Jeff and send it to you? Its great for baiting “you’ll-take-objects-from-my-cold-dead-fingers-I-never-saw-a-pattern-I-didn’t-like” team-mates. http://jcooney.net/archive/2005/09/24/6824.aspx

Interesting how this conflict still rages. The reality on the ground is that for large enterprises, anything but a relational data store is going to seriously hamstring your access to that data. Object to disk storage techniques that don’t use a relational database simply silo off that data so it is nearly useless outside the original program.

However, even those tools that can serialize to a relation successfully make the mistake to too tightly coupling the object with the data model. A better idea for scalable solutions is to use an ORM tool to create a data access layer. Then build your business logic on top of those objects as needed to express more complex object behaviors. Personally I use LLBLGen because of the strong typing of the entities it produces and fact it avoids using text strings in favor of field enumerations. You really avoid a lot of runtime errors that way.

However, the exact tool is unimportant. What is important is that your business logic layer creates the abstractions above the data access layer that your ORM produces, thus avoiding that close coupling.

The downside is that you can’t simply dump state by serialization. The upside is you gain more version independence and interoperability with other tools. I think that’s worth the extra work on the business logic layer.

“the only workable solution to the ORM problem is to pick one or the other”

Come on Jeff - you’re starting to sound like one of those developers that believes in absolutes. Each of the six options has merit in certain situations. We should be open minded enough to pick the right solution for the problem at hand.

I find that too often all problems are generalized into one solution. In fact, taking out “reporting” and all aspects of a project that look like reporting (anything with rollups or pivots, for example), and using specialized tools or an alternative approach to ORM for those is quite a good way to clean up a project. Take out difficult slicing and dicing of the data, and suddenly ORM solutions are very workable for all the other rote work. It’s a simple yet highly neglected concept.

Wow, there is a whole bunch of “not getting it” here.

People, having to model your data twice is bad. BAD!

Stop denying it. Stop pretending it’s somehow a good thing. It’s not, its tedium, it impedes productivity and bloats and slows down code. Its a source of bugs, of performance problems, of general frusteration.

And for what? Why do we keep trying to shove our objects into relations? What’s the big payoff? WHAT?

Ted’s point is not that “there is no good solution to the object/relational mapping problem”. If that’s you’re takeaway, you’ve completely missed the point.

ORM tools can be used successfully. The point is that they only cover about 80% of the O/R mismatch problem - beyond that they produce diminishing returns. Our mistake as an industry is viewing these tools as a silver bullet, rather than as a partial solution to a problem that will never have a perfect solution.

Speaking about ruby, there seems to be something even better than the ActiveRecord (that maps Relational schema to Objects, this one has the different approach, it uses RDBMS just as storage for the objects):

http://www.nitroproject.org/rdoc/og/index.html

Did anyone here try Cache from Intersystems?

the notion that a mismatch exists rests on a false assumption about how computers really work. that assumption is that method text must be stored with data. real computers don’t keep multiple copies of method text: one copy of method text, multiple copies of data (data is what distinguishes one object from another). method text is identical for all instances.

there is, therefore, nothing to be gained by redundantly storing (in a language and application restricted way) method text.

a concerted review of the Relational Model, as told by Codd, shows that this is really an object oriented view of the world: data and all that manages it reside together in an open (in the sense of univeral access) manner. data integrity is part of the data store, and thus supports open access. remember, Codd’s paper was titled “A Relational Model of Data for Large Shared Data Banks”.

point 2 is just fancy words for saying “we like locking up our data in our code, just like we did back 1960’s with COBOL/VSAM” (for the MF-ers). what, in the world of politics, is called reactionary. moreover, and the part that really ticks me off, is that “bean paradigm” programming is not distinguishable from COBOL copybook programming. this is not progress.

real progress is being made with approaches that generate the UI (which is just a fungible artifact of the datastore, and therefore irrelevant © ) from the database schema. i swear, people who yap about mismatch can’t possibly know how either the RM or SQL data really work. and probably think that there is such a thing as “an object model”. i feel better now.

And for what? Why do we keep trying to shove our objects into relations? What’s the big payoff? WHAT?

Umm-- so that the data can be stored in an easily accessible manner in a commodity RDBMS server?

I see a whole bunch of “not getting it” here, too. “Introspect” the schema from the RDBMS. Don’t represent the schema as XML, or as code. Generate your UI dynamically based on the schema. If you need more metadata than “introspecting” the schema from the RDBMS gives you, define a standardized way of storing that metadata in the RDBMS.

I feel silly, but to expand on my prior comment:

Buggy Fun Bunny has it again. Inheritance hierarchies can be modeled in an RDBMS. It may be difficult with SQL, but SQL isn’t the relational model-- it’s a just query language. Don’t judge the relational model by SQL.

Even staying inside the SQL box, though, look at PostgreSQL and the inheritance support there. Anything you can do with PostgreSQL’s inheritance you can do with any other RDBMS that supports triggers and stored procedures, if you want to code up the infrastructure.

Umm-- so that the data can be stored in an easily accessible manner in a commodity RDBMS server?

So the reason for converting object to relations and back – with all the bugs, performance problems and tedium that come with it – the reason that makes all the problems worthwhile, is the ability to store the data in a relational database?

I’m not sure what tedium and bugs are being talked about. You have your ORM generated mapping layer that does the mapping the simplest way possible: one entity type per relation. It is automagically generated from the schema, including views, procedures, etc. You have business logic that needs to manipulate those entities… that manipulation would be the same (supporting business rules) whether the data was stored via serialization, in relations or in a rutabaga. Finally you have your UI, which talks to the business layer, oblivious to the storage choice made.

Now there are some types of apps that don’t need to co-operate with anything else in the world. In those cases, store you data in a rutabaga if it makes for happiness! In large data storage systems (you know, the ones relational databases are designed to address) I would prefer not to require a specific language (or vegetable) to read the data. I know my clients are happy they can use comfortable reporting tools and analysis tools instead of going through import/export hurdles.

I do not agree. I use Gentle.NET and it solves more problems than it creates. (www.mertner.com/Confluence)

It’s not a magic solution, but overall you can develop 30-40% less code once you understand the framework (quite simple to be honest).

New 2.0 version (for .NET 2.0) is on its way and it looks very promising. There’s a tradeoff, of course, but once you go OR all the way, you can’t go back :wink:

Ok, I’ve been a programmer for 16 years, mostly straight-up client server / Windows applications and have more recently been moving into “web applications”.

In this amount of time, I have seen these frameworks come and go every couple of years. They all seem to preach one or more of these points:

  1. Your programmers (who are apparently stupid) don’t need to know anything about the underlying data model!

  2. Writing native .SQL in your applications is too hard to maintain and write! (again, stupid programmers)

  3. UI should never never ever talk directly to your database. (whatever…)

How, exactly, is adding ANOTHER layer between your application and your database going to make maintaining anything EASIER? And how is it going to make ANYTHING faster than a well-written piece of SQL?

If someone makes a change to the db schema that is going to throw something off down the line, SOMEONE has to go and either fix the application or the framework, or whatever.

Here are a couple of tips:

  1. Yes, seperate your DB and UI code into seperate classes/objects, if possible. It makes things much easier to debug and maintain.

  2. Make your classes/objects work as a logical group of related funtions or temporary data stores.

  3. If you have to process data that the client doesn’t need to process, do it in a stored procedure on your database, if possible. Don’t select 1,000,000 records from your database into your application/“framework” for processing if you don’t need to.

Yes, this is some simplistic stuff, but it’s just a few points that I live by. If you are a decent programmer and can keep your code logical and well-commented, no other (decent) programmer that steps in after you should have a problem maintaining your code.

I think programmers need to learn more about how databases work, particularly how to write stored procs and triggers in their database of choice (if they are available).

It has always been my view that programmers should not try to isolate themselves into a black hole, not knowing or understanding how their own data is stored.

http://www.hibernate.org