Embracing Languages Inside Languages

Your Subsonic example hits close to home because I’ve been using another .NET ORM that’s about 3 times as verbose. We were actually thinking about switching to Subsonic because it was LESS wordy. These days I feel like dumping the ORM altogether.

But what about situations where a DSL can be much less repetitive? For the basic CRUD stuff, the same ORM I hate for being so complex can also be very concise in situations where I’m saving data for just one table. Do think there’s a place for abstracting away the underlying language if it’s much less work?

For example, the form helpers in Rails write a lot of the repetitive HTML code for you. If you already know HTML and you’re not doing anything out of the ordinary is there an advantage to writing it by hand when you have no reason to?

@Haacked,
I’m not sure that LINQ is a good example of “abstracting away the database” from a db platform perspective, since LINQ to SQL is entirely SQL Server specific (my biggest disappointment with LINQ).

I agree pretty much with you on the RegEx example, probably because regexes are usually very brief and terse, and that Joshua example is… quite an explosion of characters.

However, while the SubSonic example might not be super pretty, I do think it’s a very good idea to abstract the database layer somewhat - no pure SQL strings littered in the code, please. Being able to switch database provider can be valuable especially during the early phases of development, but it also makes it easier to use features like placeholders (who likes SQL injections?) and, say, memcached.

Imho any coder littering their code with raw SQL queries, especially in php/asp, should be shot on sight.

This really should be 2 posts.

I thought about that, but SQL and regex have more in common than most people realize. They truly are miniature domain specific languages-- and they’re both absolutely core to any modern development software toolkit. Furthermore, they do vary quite a bit in syntax per environment/platform, so you could make the counter case for abstraction as well.

You might want to run my blog engine on MySql while I run it on SQL Server. If I embed SQL all over my app, then I’m stuck.

I’d argue that the kind of simple SQL you need for a blog engine is all SQL-92 compliant anyway. Of course you’d have some kind of data layer; I’d just choose an extremely minimalistic one.

http://troels.arvin.dk/db/rdbms/

a = (b + 1) * a ^ 2
is a lot better than
a.valueOf(b.plus(1).times(a.pow(2))

Totally agree! It’s like we have this hammer called “objects” and we can use it to solve every problem, readability and comprehension be damned! The XML integration in VB is a good example too.

http://blogs.msdn.com/xmlteam/archive/2006/02/21/536197.aspx

I think of LINQ as “providing abstractions for the ideas of sorting, filtering, projecting and grouping which work across arbitrary data”.

True-- but bringing the database to the language is a reasonable way to explain that, too.

IDataReader rdr=new Query(“Customer”).WHERE("Customer.Columns.Country, “USA”).OrderByAsc(Customer.Columns.CompanyName).ExecuteReader();

IDataReader rdr = QueryDb(“SELECT * FROM Customer WHERE Country = ‘USA’ ORDER BY CompanyName”);

Which is more likely-- developers that understand Subsonic, or developers that understand SQL? Which is more desirable?

I will agree it’s a total travesty that we don’t have IDEs that can understand SQL strings and provide automatic intellisense and parsing of the SQL statements in SQL strings.

intangible,

To answer your questions

  1. How often does your app change underlying database?
    For many projects, never. For many projects, often. As someone else pointed out, if you are releasing something for more mass market distribution on their own servers, they often want to choose between SQL Server, Oracle, MySQL…

  2. When it does change, doesn’t a giant portion of the code end up having to be rewritten anyway?
    Not if done correctly. A small portion may need to be rewritten if that portion had to be optimised for a particular DB platform.

  3. How much time is spent abstracting things to the point where you see no SQL, would that time be better spent elsewhere?
    No time is spent. If you begin the project with the particular framework and methodology then no time is lost.

  4. What kindof performance hits do you take by not utilizing performance enhancing / time saving features of your DB are you avoiding in the name of abstraction?
    It depends. In most cases, no performance hit that is noticeable by the user. Of course if you have a function ‘update all stock prices on 200,000 items by 10%’ then you may want to put that in a stored proc to speed things up. The framework should allow for that so that it is not painful to change later.

I think there is a clash of minds sometimes between ‘code’ guys and ‘DB’ guys when developing applications. DB guys (and girls) think the database is the center of the universe and to take anything out of the DB would be sacrilege. Pure code guys think the DB is just a data store and not much more. Good develops hover somewhere between.

To answer some points raised by an earlier poster:

How often does your app change underlying database?

Shrink-wrap webapps have to target ALL databases.

When it does change, doesn’t a giant portion of
the code end up having to be rewritten anyway?

No. A decent abstraction layer can avoid this entirely.

What kindof performance hits do you take by not
utilizing performance enhancing / time saving
features of your DB are you avoiding in the
name of abstraction?

An abstraction layer can take advantage of DB specific features.

I use Hibernate (Java ORM toolkit) and its “dialect” classes make the creation of shrink-wrap applications that target multiple databases a doddle.

The question of whether to abstract away the database and of how query the database are orthogonal.

Hibernate provides a set of OO classes for accessing the data, which nobody uses. It also provides something called “HQL” which is a SQL-like set-oriented language for querying the object relationships persisted into the database, and everybody uses this even though it’s got some rough edges. Jeff is right - sometimes you really do need a language that matches the problem domain.

Jeff,

“I’d argue that the kind of simple SQL you need for a blog engine is all SQL-92 compliant anyway.”

Seriously Jeff, this is an amateurs mistake.

“I would argue that most of these benefits could be realized with smarter IDEs that actually understood native SQL strings (or regular expressions), rather than relying on a slew of generated code and special purpose object syntax…”

I don’t know about regexp’s but I do know from poking around in lexx and yacc that implementing a parser for SQL is not such a trivial task as it might look (for such a “simple” language the SQL spec is huge). And there are so many DBMS specific “features” to consider.

This is a subject I feel strongly about. Let languages specialise and merge elegantly so that we can switch paradigm accordingly using a clean syntax.

Fluent languages are like going back doing math the way Bertrand Russel and friends did it before he came up with the current mathematical notation.

Sometimes I wish we could take the next step and go beyond the 7-bits ASCII notation when illustrating code. I know, I know, this means a whole new bunch of keyboard escape sequence to remember, but on the other hand, at least, it might sufficiently annoy enough anglo-saxons with UTF-8 issues they might finally decide in unison to support that charset universally in all of their applications, much to the benefit of all cultures who make a normal use of all sorts of strange little characters.

…but I disgress.

The problem with SQL is that its abstraction capabilities are pretty crappy, and the code is accordingly verbose, hard to understand, bug prone and tightly coupled. ORM’s (atleast the good ones) present you with a way to abstract common relational operations into simpler easily composable terms.

Heres an example in SQLAlchemy, a Python ORM, although this probably works the similarly in other ORMs. (sorry for the long post, but I think the illustrative examples are worth it)

Simple selecting is not all that much different:

print Customer.query.filter_by(country=“USA”).order_by(Customer.company_name)
SELECT customer.id AS customer_id, customer.name AS customer_name, customer.country AS customer_country, customer.company_name AS customer_company_name
FROM customer
WHERE customer.country = %(customer_country)s ORDER BY customer.company_name

But lets say we need only the rich customers that have more than 1M total cash on their accounts:
print Customer.query.filter_by(country=“USA”).filter(select([func.sum(Account.balance)]).where(Account.customer_id == Customer.id).as_scalar() 10**6).order_by(Customer.company_name)
SELECT customer.id AS customer_id, customer.name AS customer_name, customer.country AS customer_country, customer.company_name AS customer_company_name
FROM customer
WHERE customer.country = %(customer_country)s AND (SELECT sum(account.balance)
FROM account
WHERE account.customer_id = customer.id) %(literal)s ORDER BY customer.company_name

This is pretty much the same on both sides, with ORM code being terser but not exactly less complex or that much shorter. But important part is, we probably need to know the total worth of our customers in many places, so we want to abstract that part out. To do this we add to our mapper the following property declaration:

total_cash=column_property(select([func.sum(account_table.c.balance)],
account_table.c.customer_id == customer_table.c.id).label(‘total_cash’), deferred=True)

Now that query becomes:
print Customer.query.filter_by(country=“USA”).filter(Customer.total_cash 10**6).order_by(Customer.company_name)
SELECT customer.id AS customer_id, customer.name AS customer_name, customer.country AS customer_country, customer.company_name AS customer_company_name
FROM customer
WHERE customer.country = %(customer_country)s AND (SELECT sum(account.balance)
FROM account
WHERE account.customer_id = customer.id) %(literal)s ORDER BY customer.company_name

Now I’d say the ORM has a clear advantage in clarity, if this is used in many places code duplication is lessened and the definition of total cash is in one single place so we can update the definition of that with little effort.

This also helps the separation of concerns. For example the predicates that a resultset has to match are the concern of the business logic side of things, while additional entities that need to be joined are determined by how we are going to display that resultset.

For example, the presentation layer might know that it is going to also display all the account details of the returned customers so we can add join for that independently of the rest of the query:
print Customer.query.filter_by(country=“USA”).filter(Customer.total_cash 10**6).order_by(Customer.company_name).options(eagerload(‘accounts’))
SELECT customer.id AS customer_id, customer.name AS customer_name, customer.country AS customer_country, customer.company_name AS customer_company_name, account_1.id AS account_1_id, account_1.customer_id AS account_1_customer_id, account_1.name AS account_1_name, account_1.balance AS account_1_balance
FROM customer LEFT OUTER JOIN account AS account_1 ON customer.id = account_1.customer_id
WHERE customer.country = %(customer_country)s AND (SELECT sum(account.balance)
FROM account
WHERE account.customer_id = customer.id) %(literal)s ORDER BY customer.company_name, account_1.id

The presentation also might deal with paging the results and know that we only need the first 10 customers, so add that too:
print Customer.query.filter_by(country=“USA”).filter(Customer.total_cash 10**6).order_by(Customer.company_name).options(eagerload(‘accounts’))[0:10]
SELECT customer.id AS customer_id, customer.name AS customer_name, customer.country AS customer_country, customer.company_name AS customer_company_name, account_1.id AS account_1_id, account_1.customer_id AS account_1_customer_id, account_1.name AS account_1_name, account_1.balance AS account_1_balance
FROM (SELECT customer.id AS customer_id, customer.company_name AS customer_company_name
FROM customer
WHERE customer.country = %(customer_country)s AND (SELECT sum(account.balance)
FROM account
WHERE account.customer_id = customer.id) %(literal)s ORDER BY customer.company_name
LIMIT 10 OFFSET 0) AS tbl_row_count, customer LEFT OUTER JOIN account AS account_1 ON customer.id = account_1.customer_id
WHERE customer.id = tbl_row_count.customer_id ORDER BY tbl_row_count.customer_company_name, account_1.id

Now that SQL could be cut down a bit by using select-star and formatted better, but it definitely isn’t easier to write, understand or maintain than the ORM query. And that isn’t even a particularly large query.

God I hope there are Java programmers reading this, they believe everything (and I do mean everything) to be done though API’s. With properties and events, LINQ, XML literals etc. in competing languages it’s getting really depressing to work in a pure Java shop still stuck with technology from the 90’s.

Man, I’d give a lot to have Regex literals in C#. Regex literals are easily my favorite part of Perl or Ruby, and I cringe every time I have to go through the whole “new Regex” dance.

I believe that object representations of queries are extremely helpful when you’re trying to create a user interface that provides the language model, for example the query expression, being able to construct the query in the native language is very nice.

Jeff - how would you write this example (from your comment), taking in an argument from a variable to avoid injection:

“IDataReader rdr = QueryDb(“SELECT * FROM Customer WHERE Country = ‘USA’ ORDER BY CompanyName”);”

I know the point your getting at, but it’s holding you over a cliff. I think you didn’t dive deep enough into this subject and see there are reasons to put the shotgun in the gun closet my friend.

Which is more probable? A developer that learns SubSonic, or a developer who writes inline SQL ripe for injection? Which is more desirable?

I believe your topic relates to the same example MS and other IDE vendors have been throwing at us for years. “Using Datagrids (datasets, datawindows, flexgrid, ultragid, whatever) you will turn all of your sql over to the control to take care of for you!”

And that always works perfectly in their examples…that deal with 1 table, with no joins, etc. Throw in even a single join, and that automatic SQL breaks down.

Not to mention if you want to give a user the ability to alter multiple rows in a table (say, set the price column of all 10 penny nails to .07 instead of .08), then the paradigm completely breaks down.

I like SQL. I like C#. I prefer to have 100% control over my sql…how it selects/updates/deletes/inserts…I still believe that I know how to tune a query much better than any annoying grid.

Oh, and if the DB changes underneath, then the grid completely breaks down – some lose columns and you have to recreate them (including all formatting, many other properties), or they just start giving compile errors. If I change a column name, I can do a very fast search and replace (helps if the IDE supports REAL regular expressions).

Because I know SQL so well, I can determine very quickly with 100% accuracy how to find all my statements that refer to those columns.

Ugh. Make the compiler better and faster…make my IDE AWARE of my SQL structure, but not try to generate it for me in some black box that I can’t get to. Give me 100% compatibility between code I wrote 5 years ago and today.

Don’t spend time trying to save me time that just ends up getting in my way.

This is another case for using the best tool for the job. SQL Queries, for instance, should probably be put behind a layer of abstraction if not for the simple fact that it helps prevent programmer error. Anyone who feels they’re confident enough to embed SQL where ever they feel like hasn’t been burned by DELETE * yet.

I agree that attempting to abstract another language via an API generally works out badly, whether the API in question is fluent or not, and the main thrust of your argument is sound. Imagine writing VB.Net through the CodeDom! Etc…

However your opening preamble gives the impression that you are using arguing against fluent interfaces as a concept, in which case you’ve used a straw man. Two bad examples doesn’t prove the case (though perhaps the absence of good ones does).

So… every language should implement every other language.

Sounds like the problem is recursive.

I love RegEx support in languages but putting SQL intellisense into a language only solves an extremely small part of the problem.

Relations, Populating your complex Business Objects, Lazy Loading as need, SQL Optimization, Database Abstraction, Strong Typing, Type Mapping, etc. – I’m only brushing the surface.

Do you really want to go back to being a full time plumber? I don’t, I prefer to outsource anything that requires a whole lot of elbow grease near the porcelain.

Jeff, I disagree with this. You say that the solution is to make domain specific languages part of the language we’re working in. For about 99% of us adding arbitrary powers to the compiler is just not in the cards. Why be afraid of abstraction? Isn’t that our job as developers?