Embracing Languages Inside Languages

@Jeff - I think you miss the distinction between one use of a fluent language that doesn’t add any value beyond the nebulous distinction of being more fluent versus the SubSonic case where there is a lot more value being added beyond just a fluent interface. If SubSonic was just about putting a fluent interface on SQL then you might have a point, but it does not, which even you acknowledge by your reference of a need to create a DAL.

The argument for fully understanding SQL is about as valid as claiming that every C# programmer should fully understand Machine Language, Assembly Language and MSIL since C# is really just an abstraction of those lower level concepts. That is the point of abstraction, so you don’t have to fully understand the lower level. I think it is pure folly to think that I will ever be able to craft optimized SQL queries even close to the same level as a DBA who lives and breathes SQL. We are already putting too many languages on the individual programmer. A typical Asp.Net developer is forced to deal with VB/C# (sometimes both), HTML, CSS, JavaScript, SQL and possibly even some XSLT. At some point you are forced to either 1) hire a bunch of specialists who understand how to write efficient code for each of the "DSL"s, or 2) abstract out all of the lower level languages to make it possible for most programmers, who are generalists, to still develop apps. I would argue that few development teams have the luxury of a bunch of specialists at their disposal and thus there is absolutely a need for higher level abstractions - even leaky ones - over some of these DSLs.

“I’d argue that the kind of simple SQL you need for a blog engine is all SQL-92 compliant anyway. Of course you’d have some kind of data layer; I’d just choose an extremely minimalistic one.”

Well, good luck getting paged results from SQL Server 2000 then. LINQ to SQL does handle this, and the syntax is wonderful as well (something like):

skip(100)
take(10)

Why/how LINQ is better/worse/different to list comprehension as seen in Haskell and Python? The trivial examples I’ve seen don’t demonstrate a great difference.

Very good subject, this is why I always read you Blog. You always seem to discuss things that matter to me.

I am one of the “bad guys” who made an abstraction for SQL. I pretty much did a Linq-alike implementation.

My Data Access Layer need to work with different solutions, both web and windows applications. And it should be able to work with several different database instances.

It should be possible to support other databases like Oracle, MySQL, etc.

All database tables are represented as classes, so that I can refactor my code and see if the code matches the database it has to work with. If I change the table structure, the compiler finds all places where it is touched.

I wanted smooth handling of null values, it should not be possible to make sql injection vulnerabilities, no type conflict between code and database. So I can do something like …

query.Where(table.Created DateTime.Now table.ProductID == Guid.NewGuid() table.Type == ProductTypesEnum.UsedItem)

It should of cause support, joins, aggregates, subqueries, etc… without worrying about parentheses, etc. Like query.Where(table.ID.In(subQuery));

All queries should be possible to be executed directly by the sql being parsed or as automatically generated stored procedures.

All is generated in object containers of the results, like query.Construct(typeof(ProductItem))

And so much more…

I know this is far from the perfect solution, for some it would probably be a disaster but for me it has worked great. Our product (and database structure) has and is changing all the time. I am not sure I would be able to keep an overview with both large code and huge amount of STPs through all those scenarios I have worked though. I very strongly believe in DRY and of cause you cant ignore SQL and TDD can save you for so many things.

But for me it was not about doing a fancy tool, it was to implement a rule-set that would force me to code the way I thought it should be coded without being lazy and taking shortcuts.

Sorry for the long post…

The “language-like flow”, in my opinion, can be made to work.
It’s the examples you gave which are problematic, because they replaced another language with just a bunch of function and method calls. While technically the goal of “language-like flow” is preserved, it is an extremely poor syntactic way of expressing another language.

There are examples which are in-between the solutions as a bunch of method calls and the solutions which actually modify the “outer” language syntax. They all have in common the following idea: hijack syntactic constructs of the “outer” language to represent operations in the “embedded” language; the syntax stays the same, the language-flow is not disrupted, while semantics is substantially different. The semantics of syntactic constructs used must surely be similar in spirit, using some metric, to the original semantics of the same constructs in the “outer” language, to prevent potential confusion. The degree of similarity may vary.
Of course, “a bunch of method calls” vaguely fits this description, but it really is an outlier.
A concrete example that I have in mind is how queries are done in Erlang’s Mnesia database. It hijacks syntax used for list comprehensions. Anyone familiar with list comprehensions in Erlang has very little trouble mentally adjusting the same syntax to the database query semantics.
Using similar ideas, and inspired by Mnesia, I have created a Perl module, DBIx::Perlish, which similarly hijacks Perl’s own syntax to represent database queries (which ultimately translate to SQL). The syntax is the one of Perl, the semantics is of a declarative query language.

Cheers,
Anton.

The problem with ‘real’ language embedding as in the Perl and LINQ examples is that there are just too many domain-specific languages that you would like to include: SQL, Shell, XPath, LDAP, HQL, and regular expressions are just a few examples. So, what we really need is a language that allows you to include arbitrary domain-specific languages as special kinds of literals.

There has been some research on this in the meta-programming and code generation community: typically meta-programmers would like to manipulate/generate programs using the concrete syntax of the language they generate, and have some basic guarantees that the generated program is syntactically correct or even compiles. This are of research is known as “meta-programming with concrete object syntax” or “statically safe program generation”.

We have extended those approaches recently to mainstream programming. In particular, the article “Preventing Injection Attacks with Syntax Embeddings. A Host and Guest Language Independent Approach” discusses a generic approach for extending a language with arbitrary domain-specific languages. The main goal of that paper was to illustrate how one can prevent injection attacks.

I can’t speak to the usefulness of LINQ, since my company has steadfastly resisted the move to .NET 2.0 (yet, though I’m pushing). But I can comment on the usefulness of fluent interfaces, since NValidate uses them.

The point of a fluent interface should be to help a developer write LESS code that is CLEARER and EASIER TO MAINTAIN. If the end result of your API is more verbose than what they had to begin with, you’ve failed to achieve your objective.

A fluent interface brings to the table the ability to exploit intellisense, and (in the case of NValidate) to reduce many hard-to-read lines of code to one readable line of code. If that’s not happening, it’s time to go back to the drawing board. Chances are pretty good that no one’s going to want to use the product, and that those who do use the product will be those who’ve had it foisted upon them. In those cases, there’s bound to be lots of grumbling.

“Ahhh. Remember Cobol: Subtract X From Y Giving Z”

I was thinking that. Of course, in some (can’t speak for all, I only ever COBOLed on IBM and VAX) implementations we could say:

COMPUTE Z = Y - X

I look forward to getting and playing with LINQ and all the other fun-looking stuff. I’m particularly relishing putting together some more complex queries and seeing how well-optimised they are at the other end. I’m like that.

Thank you for bringing this up. When I was in school, one teacher was expounding on the virtures of ob-oriented rendering. While I “got” the idea behind it, I still couldn’t understand why it was better than using simple string manipulation.

I wonder – how deep could we nest languages? Maybe some C# code using SQL to store javascript in a database that includes a regular expression?

While I can take or leave fluent interfaces, I think that this post unfairly represents the motivations for making query objects instead of strings. If you are successful (and, yes, that is very hard), you can have queries you can reason about. You can create an algebra for your objects and combine them in interesting ways, build higher-level abstractions and so forth.

I will agree that the vast majority of attempts to do this fail miserably and provide no value over embedding straight SQL as a string. Still, when somebody has their domain figured out and they have managed to construct an abstraction that allows the programmer to do more than before (e.g., build user-driven queries at runtime without making StringBuilder your best friend), I think it is a great success.

The point is never whether you can build an object-oriented design that does something trivial – it is what can that design allow you to do that the grungy text version couldn’t… If the answer is “nothing,” then I agree that the whole exercise was futile. I’m just pointing out that it doesn’t have to be so.

Talking of misconceptions, @Louis Kessler, slagging APL on the basis of whether it can be compiled or not is a dangerous game: virtualy all of the ancient programming language esoterica that newbies love to disparage because they can’t imagine an efficient implementation (think Lisp, Smalltalk, APL, etc.)… Actually have very efficient implementations. The fact that it didn’t come in the box with Visual Studio does not mean it doesn’t exist!

Good point for writing small apps. Wait until you’ve written the same query in who knows how many points in your application, then we’ll talk… Oh, and wait until you’ve changed your DB and have to go back to all those places and change your query… Good luck!

joke
jeff’s juss jealous that rob got assimilated and he didn’t.
/joke

all very good and clever points. replacing regex like that is just a brutal crime. to paraphrase Justice Gray:

“Regular Expression don’t suck – but you do.”

Man, I’d give a lot to have Regex literals in C#. Regex literals are easily my favorite part of Perl or Ruby, and I cringe every time I have to go through the whole “new Regex” dance.

Of course, with .NET 3.5s extension methods, you will be able to write your own regex method on strings so you could just say “foo”.Matches(“yourregexphere”)

“…code that looks like exactly what it does.”

And that’s the bottom line. Code is an arbitrary interface to the system. Nothing more. Nothing less. It can help or hinder.

Regex, whatever else one things about it, fails here. Perl too. Like VB-ish languages, SQL is just English-like enough to do the job, but not too verbose to get in the way. Following the “code that looks like exactly what it does” standard, C-like languages would get lower marks than something like VB, PHP, Ruby, etc.

(Newsflash: braces and semicolons are NOT inherently superior to any other form of delimiter)

The Linq example is unfair, since the compiler DOES turn the embedded sql syntax back into the object soup you so hate and calls that.

You can even write the object soup directly if you desire.

I’m also betting people unfamiliar with RegEx will understand the wordy multi-line example long before they even figure out that the alphabet-symbol soup is anything other than a head roll on the keyboard.

SQL and “regular expressions” are similar because they are both declarative languages. You describe the result you’re looking for, not the steps to get it.

PowerBuilder has supported SQL directly in the language for more than 10 years, with type checking and without passing strings of SQL statements (like just about everything else I’ve seen, other than the LINQ examples above–I’m going to have to look into LINQ). You have procedural access to results in the PowerScript language, and if you want object-notation access to the result instead, use a DataStore/DataWindow.

Strangely, PowerBuilder has had regular expression support in the editor for quite some time, but it is not available in the PowerScript language (unless PB11 has it–I haven’t used it yet).

–dang, former Certified PowerBuilder Instructor, circa 1993

I agree with some of what you say here but I think you miss a few points.

Object models that represent domain specific languages (for example your sql example or the CodeDom) have two distinct benefiets over simply writing the code itself.

1.) It allows it to be crafted in a implementation agnostic way. Meaning you could build up your objects then convert it into MySql, SqlServer or Oracle SQL quite easily and for the CodeDom you get VB.NET, C#, F# or whatever.

2.) Dynamic generation. No one would ever write all of their classes with the CodeDom for the exact reasons you specified above it would be stupid. However if you need to dynamically generate SQL or a class based on some meta data then using an object model to do it is a really big help.

But I agree with most of your criticizms in general and would add that there are additional problems related to ORM, where you map objects to a relational model but that is probably beyond the scope of just this blog post.

You might be interested in reading about NBusiness however, which is an interesting solution to the problems you’re describing. For example:

http://www.codeplex.com/NBusiness/Wiki/View.aspx?title=A%20simple%20entityreferringTitle=Home

It uses a language called E# to allow you to define your business layer and uses templates to author much of the code and SQL. So in this case you have both the generated SQL/objects you are railing against AND a succinct domain specific language that you are using to declare it. What do you think about that??

You might try reading about “Intentional Programming” also if you want to stretch your brain for the day. It has some interesting ideas that I interpret as dynamic layers of domain specific languages.

What about TCL?

I’ve worked with a lot of scripting languages, using regex and SQL,
blending sh, awk, perl, etc…
but never felt the ‘language within a language’ as much as when I dipped into Expect with TCL as it’s embedded language.

It wasn’t terribly difficult to learn, but having to learn all the gotchas and caveats of a new language was a pain (unbuffered input coming in unpredictable chunks based purely on timing for example).

To this day I dont know how to feed an array of args to ‘spawn’ and get it to use them as separate args… I had to auto-gen a new expect script instead whenever I wanted to feed a pile of args to ssh.

I know that loading an expect perl module from CPAN would have helped,
but I had already coded 95% of the functionality in TCL before I entertained that option. I was of the position that expect was built to handle the interaction, whereas the perl module seemed less mature (at the time). I also never saw the need to put everything in the same language.

Perhaps this is a case where it would have paid off to keep in all in perl.

Good post. For years I’ve been saying something very similar. I’ve noticed more and more layers between typing code in an IDE and actually querying data. I’ve always said it seems like people are more and more afraid to actually do a query.

As usual however, you said it much better.