All Abstractions Are Failed Abstractions

codinghorror · June 30, 2009, 12:00am

In programming, abstractions are powerful things:

This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2009/06/all-abstractions-are-failed-abstractions.html

pm100 · June 30, 2009, 12:00am

not true. “All abstractions that you notice are failed abstractions”

the fact that the database stores blobs of data on disk, not tables or columns or indexes is an abstraction. This one you dont notice

The mirage that the disk really has blocks or files or anything except long streams of ones and zeros is an abstraction too.

THe mirage that the disk has long steams of ones and zeros is an abstraction - in fact it has only areas of varying magnetism spinning on platters at high speed . Unless of course its a USB drive, or a CD, or a ram disk.

RobertB · June 30, 2009, 12:00am

I’m no fan of LINQ, but, as other commenters already stated, your example is wrong. You need to select the ID field in your LINQ query.

I don’t disagree that one needs to know everything about how their backing store to get the best performance. Given that, I see no reason at all to use LINQ.

If I have a DAL that can get me data into a DataTable in one function call, and that data can be passed up the stack (unlike the LINQ result, which can’t, because it’s an anonymous type), I don’t see how LINQ buys me ANYTHING at all.

Someone mentioned using a background thread to refresh a cache. Wow, talk about swatting a fly with a cannon. How about just timing out the cache item and upon retrieval checking the timeout, and if it’s expired, re-query the database, otherwise return the cache item. No other threads involved, no possible multi-threaded mistakes, etc. Easy-peasy. Of course, this assumes on actually designed a DAL and the DAL’s API, and didn’t just hack together a mess of code “to make it work.”

jasonmray · June 30, 2009, 12:00am

I agree with Joel’s statement that all non-trivial abstractions are leaky in the sense that, by their very nature, abstractions hide details that are considered less important at that particular level. An abstraction has “leaked” the details that it abstracted away.[1]

I disagree, however, with your statement that all abstractions are “failed” abstractions, mostly because you’ve not defined failure. You’ve even provided a pragmatic point-of-view where abstractions are not failures (usability, and ease of understanding).

Basically, if an abstraction hides the details that we are concerned with, then it is our job as developers to choose a different (less abstract) abstraction. If you are concerned with query timing and LINQ doesn’t give you fine-grained control over that, then LINQ is the wrong abstraction to use.

Abstractions only fail when they are wrong tool for the job, but in that case it’s the developer that has failed.

[1] Of course, it is possible to prove that any given abstraction has a one-to-one correlation with what it is abstracting, thus making it a “leakless” abstraction. For instance, one might be able to prove that the SQL grammar is fully covered by the LINQ grammar. But in Joel’s argument, I’ll assume he considers such an abstraction to fall into the “trivial” category.

AregS · June 30, 2009, 12:00am

Very interesting post.
I am curious to know if you use the LINQ or the SQL blob method to implement the ruby on rails style “counter caches” for lets say “the number of votes on a answer” in stack overflow.
Doing it with LINQ you have to retrieve the parent answer row for the child votes rows and change the count on the retrieved answer object and then save it back while handling any concurrency issues.
On the other hand using a SQL blob just entails wrapping in a transaction, an insert into the votes table and an update to the vote count in the counter cache column of the parent answer row.
It seems like using LINQ would be much mode complicated and less efficient.

Kibbee · June 30, 2009, 12:00am

I’ve said it for a while, that what they should have done instead of Linq to SQL, is add support for inline SQL, like they have support for inline XML in the newest VB.Net. This way, you could have the compiler check the sql for you, and even have dynamic objects returned with the correct properties set up, but you wouldn’t have to learn some backwards SQL language, and you could fine tune your queries exactly the same way you had been doing for decades.

Murthy · June 30, 2009, 12:00am

Goal of every abstraction is “Reducing Complexity”. The price we’ve to pay is “Implementation Overhead”.

As long as “Overhead” are acceptable, abstractions are “Not leaky”.

In another way why “Abstraction” can be “Leaky” is the “Level” in which they operate. Higher the “Level” more “Leaky”. (LINQ vs TCP, one operating at very high level and another at very low level)

JessicaB · June 30, 2009, 12:00am

The essence of your argument is that Linq can sometimes be slow. The solution you pose is to increase the complexity of your code to make it quicker. To do this you propose getting into the guts of SQL Server, and the abstraction layers. However, there is a huge cost associated with doing that – namely maintainability and complexity. The best code is the code that you don’t have to write.

You are correct that slow pages loose users, however 404s and Web page exceptions loose a lot more.

The obvious solutions to the problem you state are:

Buy another server.
If that doesn’t work, buy another one.
Repeat 1 and 2 until your problem goes away.

Servers are much cheaper than fixing even a few dozen lines of program code.

In the absence of this:

Profile your code to find the 1-5% that is the bottle neck.
Rewrite that database access as an SP.
Modify your Linq to use the SP.

Nij · June 30, 2009, 12:00am

Jeff, You need to be very careful about Top(n) queries because if you don’t have the right index on the appropriate columns you have to scan the complete result set before you can sort it, and only then take your ‘top’ results.

Based on your post, and to second someones suggestion, why not just select the body column (rather than the Id as other’s have suggested) - as that implies an extra step (getting the data that goes with that Id).

Tracing and Explain Plan are so easy in Sql Server that I would strongly recommend that you learn how to use them. As you are using Linq stick Sql Profiler running on your test DB instance and watch all the commands you are running against that DB. It’s educational… especially as you will get to see any behind-the-scenes queries that Linq is doing - perhaps of the Data model.

Finally, therefore, I don’t agree with you that the abstraction is leaky in the way you think (I don’t know Linq well, but am pretty reasonable with SQL). You are just experiencing the joys of performance tuning! Just because it is slow to do string concatenation in C# (compared to Stringbuilder) does not make strings a bad abstraction for, ahem, strings; the implementation just makes it sub-optimal in certain circumstances

Rob · June 30, 2009, 12:00am

i agree that you cannot, shouldnot, as a developer just blindly use an abstraction and just expect it to always be the most optimal method for any case. While some abstractions are simple and I don’t necessarily understand 100% but can be confident of being relatively optimal for most cases (e.g string.length method etc), something as extreme as abstracting your entire data access layer REQUIRES you to understand what its doing under the covers, or just live with any in-efficiencies that occur. I just don understand developers who implement some pattern or architecture and think that just because it makes sense and is fast for some cases, that it will always handle any situation in the most optimal matter.

Steve · June 30, 2009, 12:00am

Ummm, it’s called “there’s no free lunch”. The trade-off for LINQ to SQL is less control, less speed, etc.

Gr8Ray · June 30, 2009, 12:00am

I hate it when programmers use ToList() when it isn’t necessary. Then inevitably, other programmers see it on blogs like this and the next thing you know, it’s everywhere.

MrPhil · June 30, 2009, 12:00am

I’d say the only failed abstractions are ones that either bringing nothing to table or don’t hide any details. In your particular example, LINQ to SQL allows the compiler to help you identify query mistakes which increases productivity. The programmer is still free to identify pain points and bottlenecks and then address the particulars at the point of a problem. Solving existing problems over the long run is better than solving all those potential problems you don’t have yet. Abstractions let us ignore details until they are important. That’s their job.

fschwiet · June 30, 2009, 12:00am

To the people who ask why they should use LINQ if it still requires understanding SQL:

The answer is that writing Linq-to-SQL code is much less work then alternative approaches (like creating/reading from SqlCommand objects).

I really don’t get why this example was chosen, its pretty trivial to change the LINQ query to only grab the ID column.

RobertB · June 30, 2009, 12:00am

@fschwiet: Once you’ve written a class that can read data from a stored procedure into a DataTable in a generic way, you’ll never have to do it again (I haven’t had to update mine in years). One function call and I’ve got data from the database. And I didn’t have to use an anonymous type or that horrible lambda expression nonsense.

Darrin · June 30, 2009, 12:00am

Oracle Pro C/C++ does the same thing using native SQL. I wonder why Microsoft won’t let programmers use SQL.

Dave · June 30, 2009, 12:00am

Interesting post, and something I do keep coming back to when I consider higher level web frameworks such as Ruby / ActiveRecord Vs PHP / SQL. I suppose it depends on how much complexity the abstraction saves you having to deal with (and perhaps security it provides / doesn’t provide) versus the potential efficiency “leaks”. Difficult call to make?

Dave · June 30, 2009, 12:00am

Erm, should have said “higher level web programming languages” there.

Matt · June 30, 2009, 12:00am

I’m not sure why you blame this behavior on Linq and you’ve proven that it is unexpected behavior and happens even if you aren’t using Linq. If you haven’t proven that then I’ve misunderstood you and you need to go back and do a little more research. Anyway, it seems that your point is that Linq chooses a default method that is sometimes less than optimal in “weird” situations. That’s not saying much in my opinion.

Usman · June 30, 2009, 12:00am

Jeff , i think i followed the solution you proposed in recent project. We started with defaulted LINQ 2 SQL operations but then restricted the set of columns we require to perform in each operation by using custom Value Objects. Hence we optimized where we needed and use defaulted where we know it does not matter (at the end very rear cases left though lolz).