All Abstractions Are Failed Abstractions

I’ve written my own C# Stored Procedure classes using generics, which for me, work very well. I simply create an instance of a stored procedure, add parameters, and call the execute method which returns an array of objects of the desired type.

e.g.

StoredProcedure sp = new StoredProcedure(“get_users”);
sp.AddParameter("@age", 21);
User[] users = sp.ExecuteWithReturnObjects();

I’ve spent no more than 10 minutes learning about Linq since it was launched, it looks far too messy for my liking. I’m sure it has it’s advantages, but from what I’ve seen, I’ve no desire to learn it as I’m happy with my own classes.

The linq example is not literally identical to sql. An ORM has quite a bit more overhead than raw sql. Maybe the linq example happens to generate the same sql but aside from that they do pretty different things. Maybe I’m just being pedantic, but you did take the effort to bold and italicize the word.

I think you missed the point of Joel’s original post. The valuable information I got from Joel’s post wasn’t that abstractions are doomed, worthless things because they can’t perfectly hide, but instead that you should be aware of what can go wrong. If you use an abstraction, you need to know what’s going on underneath to some extent, so you can deal with the inevitable leak.

What I find most amusing with this post is that you seem to have failed to grasp this from Joels post and so wrote your own version of it using an updated abstraction :slight_smile:

The whole point about abstractions is that they leave stuff out. This isn’t “leaky” (which sounds like a fault) but the raison d’etre. In this case, the abstraction tells you what the result of your query should be, not how long it will take. If you’re interested in speed (and good if you are), you simply can’t expect SQL/LINQ to tell you.

I’m surprised that only very few people are interested in the actual SQL issue itself. It seems to me that there must be something wrong with the way Jeff measured the queries. My experience is that SQL Server (as any other modern database) is much smarter in most cases that we are, and that if you come up (according to you) with an optimal query plan, SQL Server usually does not perform worse, and usually it does something better. There are definitely edge cases, but simple queries like the ones Jeff is talking about, should be straightforward.

First, I hope that Jeff is intentionally omitting the ORDER BY clause from the simplified snippets:

select top 48 Id from Posts

select top 48 * from Posts

Without the ORDER BY, it’s completely different story. But since it’s so trivial, I assume that Jeff just omitted it for clarity.

What I don’t get mainly is the following. According to the Jeff’s measurements, doing something like:

declare @ids table (id int PRIMARY KEY);
insert into @ids (id) select top 48 * from Posts order by SomeField;
select * from Posts in @ids;

should take the 404 ms. But, as I said before, it’s likely that SQL Server uses something along these lines (or pehrpas something more efficient) when executing this:

select top 48 * from Posts order by SomeField;

So there must be something fishy in the measurements:

  1. How was the timing done? In SQL? In a client app?

  2. What the time actually is? CPU? Entire query?

  3. Jeff, did you make sure you used DBCC DROPCLEANBUFFERS before every test?

I would be also good to see query plans. I’m sure the explanation of this issue must be something simple.

@Przemek and others: Chill. Jeff didn’t say that LINQ should not be used. He didn’t denounce it, rather the opposite.

So the lessons I see here is this:

  1. LINQ2SQL is an abstraction in that it wants you to consider databases as object collections. It is leaky, because there exists cases where you need to know that this is not really true. That’s the whole meaning of “leaky abstraction”.

Note that it doesn’t matter that you can write Linq-statements that gets better results, because in order to do so you need to bypass the abstraction that Linq provides. It’s still leaky.

Also note that “leaky” is not an insult! SQL itself is also a leaky abstraction: sometimes you need to be aware that your data is not really all in RAM but resides on a disk, and that disks are spinning metal platters with reading arms that need to move.

  1. The point of this is that you should use abstractions, since they are easier to use and often good enough, but you should also be aware of their limitations when you reach the edge cases.

Oh, and of course:
3) SQL is weird, it’s counter-intuitive that top 48(*) would be slower than first getting just the IDs then looping over the result to get the rest.

Good post.

@Console thank you for your explanation, I might have missed the point initially

I heart stored procedures.

Pre-compilation has to be a bonus, business rules at the DB level ensures integrity across multiple apps, plus you can offload to your DBAs for performance tweaks.

I may be missing something though with regards to the benefits of OO access to a relational DB, maybe for smaller apps.

@JessicaBoxer:

"You are correct that slow pages loose users, however 404s and Web page exceptions loose a lot more.

The obvious solutions to the problem you state are:

  1. Buy another server.
  2. If that doesn’t work, buy another one.
  3. Repeat 1 and 2 until your problem goes away.

Servers are much cheaper than fixing even a few dozen lines of program code."

I understand what you mean and don’t want to sound like an environmental extremist, but I really hate comments like this. Surely I can understand that one cannot spend hundreds of hours of tuning if a single extra server will do the same job. However, I would expect that people put effort and competence into tuning, and that buying a server is not the default solution. Learn to utilize the existing power of your hardware. If you screw up your SQL, fix it, don’t buy a new server to cover up the performance problem, which in essence was a competence problem.

It is this whole mentality of “unlimited resources” and “commodity economics” that gets us humans into trouble. You clearly see it in operating systems too. My current hardware is hundreds of times more performant than 10 years ago, still as an end user I have to wait just as long. Capacity is wasted at an obscene level.

I will not bother to go into the environmental cost of it all, as I doubt anyone will care.

Bit of a straw-man argument. Joel’s warning that useful abstractions tend to be leaky was surely not intended to say “don’t use abstractions”.

Very apropos – this sort of thing just came up on Stack Overflow. “Which key is more optimized?”

http://stackoverflow.com/questions/1058765/mysql-which-key-is-more-optimized/1058915#1058915

Ignoring the non-answer of “Test it and see”; in order to really start somewhere you have to KNOW how SQL, and in this case MySQL, indexes are constructed so you can generalize to an optimal answer.

This is the sort of thing that won’t be easily abstracted away any time soon. Even Microsoft SQL Server’s Index Tuning Wizard (now Database Engine Tuning Adviser) that has been in development for MANY YEARS often gets it godawfually wrong.

For the forseeable future developers will need knowledge of SQL DBMS fundamentals.


Matt

@Ferdy: I couldn’t agree more.

@c (and others): Linq to SQL can be used with stored procedures.

http://weblogs.asp.net/scottgu/archive/2007/08/16/linq-to-sql-part-6-retrieving-data-using-stored-procedures.aspx

Burrrrrn! The commentators really ate you up with this one and this time i actually have to agree with them.

Call me a stick-in-the-mud if you want, but I confess I’m yet to be persuaded that LINQ and the entity framework are anything more than a stored-proc replacement written by those who don’t fully understand databases. I’ve tried both ways on plenty of big projects and no way do I want to embed any more direct table access SQL in my app code than I absolutely have to - it just gets fragile and harder to keep track of far too quickly. I can live with and deal with the lack of typing far more easily, and if the SP is properly set up it’s not going to crash at runtime due to a malformed query anyway is it?

It’s got better since, but I remember getting very annoyed with ASP.Net 1.x, feeling like it was written by a VB team to address the problems of a beginner web dev - as an experienced web dev, it felt like it utterly missed the point and made some very silly base assumptions seem routine.

Would be nice to see a new .Net toy that felt like it was designed by the domain experts and not the interns.

Well, duh. The devil always has been, and always will be, in the details.

There’s no such thing as general code, or general actions. It’s just a large morass of specific code and specific actions.

The minute you generalize or abstract, you’re hiding relevant, and important details. They’re both the same thing, really.

There’s another problem with this article that no one else has noticed (based upon my skimming of the comments–I apologize in advance if I’m wrong).

There’s a very good reason (and it has nothing to do with LINQ) that:

SELECT TOP 48 * FROM X ORDER BY ID

would be slower than
SELECT * FROM X WHERE ID = n

where n goes from 1 to 48.

The first query does more.

The second looped query makes an assumption (perhaps a valid one…I don’t know) about the data that the first one can not. The assumption is that there are no holes.

what if there’s no record with ID = 24? Then to get the top 48, you’ll have to look at ID’s 1 to 49. The first query does that. The second loop does not.

I have no problem with using a slower query (LINQ or otherwise) that’s actually going to return all of my data, over using a faster one that isn’t.

I think many of the criticisms of this entry are missing the point. That said, I’d like to add my own point:

The problem with LINQ is the same as that of any attempt at a “general” or “generic” interface to multiple database systems. You’d have the same sorts of issues using ODBC, Pythons ABI, JDBC, etc. Jeff is right that LINQ is a “leaky abstraction”, but it’s just the latest in a long series of them.

The real problem is that every database is different when it comes to performance optimization. The “select id” may be more efficient for many databases (and it should be), but you can’t in general count on any query being optimal for all target databases. One example is that SQL Server and Informix like you to write several queries and link them with temporary tables, but Oracle much prefers you to use sub-queries aggressively and let the optimizer handle it.

So, if you want ease of programming, use LINQ, or ABI, or JDBC. If you want performance, use native SQL. Understand the optimizer. Try different queries. Resort to optimizer hints if necessary. And, of course, design the physical and logical database structures properly.

I believe LINQ to SQL is an unnecessary abstraction. It just adds another layer to your database and doesn’t let you control how many calls you send to the database. Just remove it already and stick your SQL in stored procedures. You’ll probably get 3x the speed just doing that.

Even Microsoft has relized LINQ to SQL is a lost cause.
http://oakleafblog.blogspot.com/2008/05/is-adonet-team-abandoning-linq-to-sql.html

I think LINQ is better suited for objects and xml. LINQ to Objects and LINQ to XML are both very good implementations.

“We can’t just pretend all our data is formatted into neat, orderly data structures sitting there in memory.”

I don’t really understand what this is supposed to mean. Of course your data is formatted in neat orderly data structures. In your case that happens to be inside the database, but either way the most efficient way to access it is to address the actual structure of that data.