Who Needs Stored Procedures, Anyways?

It's intended as sarcasm, but I believe this Daily WTF entry on Stored Procedures should be taken at face value:


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2004/10/who-needs-stored-procedures-anyways.html

Having worked with both Oracle and SQL recently, and having seen dynamic SQL being used in both, I know that performance is definitely an issue, but SQL server developers probably understand the performance implications less than their Oracle developer counterparts. In Oracle, not using bind variables is one of the big no-no’s because of SQL parsing and SQL statement caching, etc.

Dynamic SQL is definitely necessary in some cases, but where you can, you should definitely use parameterized SQL or stored procs. Stored procs are generally precompiled so you know that, as it is created in the database, that the statements are valid, so you don’t have SQL syntax errors. Knowing these things should save you headaches later on. Jeff, you comment on the lack of ability to debug stored procs in the same environment as the rest of the development that you do, but I would argue that if you used stored procs, you are less likely to have to debug the SQL. Not only that, if you had stored procs, it’s unit-testable.

I personally don’t like to have a lot of business logic on the database side either, but I think the logic relating to data integrity, etc. and having to do with data storage must definitely be on the database side. For example, and I think most people will agree with me, if you have a data table and then you have a hist table to keep track of changes to rows in the data table, that “business logic” should stay on the database. For me, I personally like to keep the database access to storage and retrieval and move most of the business logic out of the database into a middle tier. But this really depends on the project requirements.

The project I am currently working on is extremely heavy with database-side business logic, but also has a lot of logic in the web code (ASP.NET). This is because the database is actually owned by our business unit. We have separate database developers who work on the database and the data is imported nightly from different locations. We use stored procedures almost exclusively, on SQL server, and we create dynamic SQL in the stored proc. and - get this, we pass XML in order to get around the limitation on complex data structures. So not only does the stored proc have to parse XML, it also has to create dynamic SQL string and execute it. In our case, I think it makes sense given the functional requirements.

SQL Server 2005 to the rescue then, eh?

CLR? Check
Stored Procedures have the ability to be created in a .NET language like C#, VB.NET, etc? Check

Sold? Check

Lets use a hypothetical real-world scenario to understand why stored procedures are not always the best approach:

You have 3 database platforms on 3 servers. One server in Washington runs Windows 2003 with SQL Server 2005. One server runs Oracle on Linux or a Sun (sorry don’t use Oracle to know if they need some outrageous beast OS/Hardware). One server runs MySQL on a Linux box.

All 3 databases share a table and must synchronize that table. You develop an application that checks each table on each database platform to make sure the synchronization is working right. Would you rather use .NET and let the Data Providers handle all of the SQL/Oracle/MySQL work? Or would you rather have to know and access 3 different stored procedures, knowing each callback and database platform UP FRONT.

I’d use .NET with 3 data providers and using common code that generates the SQL needed to use all 3. This way you don’t have to code around any inconsistencies in the underlying databases because they will all return the same .NET structures you can easily use. Can you honestly say you can use the same stored procedure and make it work exactly alike on every database platform you put it on? Simple stored procedures maybe, but they’re so different that you’re bound to have some kind of differences that you’ll have to code around.

Then again I could just be pulling all of this out of my ass. I’ve never really bought into stored procedures because I want apps that work when you plugin a new data provider. I don’t want to have to think “Oh you have to add this stored procedure, throw salt behind your back, cluck like a chicken 3 times, and pray that it happens to work exactly as it should”.

I know it’s not quite like that but I like having all of the SQL in one place. I just edit my code where the code is, not having to remember to edit both the code and the stored procedure on whatever database I happened to be working on at the time. Oracle stored procedures can’t be accessed in VS.net by default, and I’m way to cheap to spend some outrageous amount of money so that I could edit all stored procedures in the IDE I’m doing my application in. SQL Server 2005 may change my thinking about all of this, but for now I’m just not seeing how stored procedures will benefit me personally. If I were dealing with customer databases and their own DBA BOFH then I may be forced to use stored procedures but I could slap the DBA whenever the procedure wasn’t working right. It does add an extra layer of slappage not currently present in my workflow.

Yes, yes yes! You hear these arguments all the time, and I’ve never really bought into them. Using stored procedures in an app basically means that you’ve a) added another layer, b) added another language and c) lost all database portablility. At the very least. I don’t think I’ve ever heard anyone go into the details around the ambiguity of SQL error messages and debugging problems before, though (kudos).

In my mind, they’re just not worth the added effort in maintenance, unless you encounter one of those situations where you can actually get a real, calculable performance gain from them. Otherwise, I’m all for KISS. Use a good OR Mapper instead.

I am all for abstractions and separations of logic. I definately do not believe that T/SQL or PL/SQL are expressive enough languages to describe that logic. Sticking critically meaningful code in the DB from what I’ve always encountered leads to the Clusterous Fucokous Anti-pattern.

This is true regarless of where you put it. Procs or embedded.

Hurray for ORM.

hear hear! The last big project I did I mixed the usage of Stored Procedures and dynamic SQL, and the result is absolutely great.

You get the performance or adaptability where you need it (it was for an ISV) and can keep the maintenance to a minimum (we even supported both SQL Server and Oracle).

Besides the performance aspect, adaptability was our main reason for using SPs at all, because you can tweak the client’s configuration just a bit more using out-of-application SQL statements :wink:

but where you can, you should definitely use parameterized SQL or stored procs

Parameterized SQL should ALWAYS be favored over dynamic SQL. Parameterization prevents SQL injection and generates more generic (and therefore more cachable) execution plans.

Jeff, you comment on the lack of ability to debug stored procs in the same environment as the rest of the development that you do, but I would argue that if you used stored procs, you are less likely to have to debug the SQL

Not in my experience. We still need to look at the tables and understand the data to do our work; having SQL helps us do that. Procs are totally opaque.

but I think the logic relating to data integrity, etc. and having to do with data storage must definitely be on the database side.

Definitely. The database should be relationally coherent and have a clean, understandable design. Fundamentally bad db design shouldn’t be able to hide behind a bunch of stored procs. The data should be good at… being coherent data. Not being a “customer”.

We use stored procedures almost exclusively, on SQL server, and we create dynamic SQL in the stored proc. and - get this, we pass XML in order to get around the limitation on complex data structures. So not only does the stored proc have to parse XML, it also has to create dynamic SQL string and execute it. In our case, I think it makes sense given the functional requirements

Urgh, I dunno, that sounds like a lot of extra conversion work. If you’re going to do that much work, why not do a little more and write a web service API. Then you could pass true objects over HTTP, and not even care how it is handled on the database side. If you are writing stored procs that themselves build dynamic SQL, might as well cut out the middleman and move this to a web service API that generates dynamic sql. It seems your goal is abstraction, not performance, and stored procs are a terribly leaky place to build your abstraction layer, IMO.

Performance issues aside, the security implications are worth using stored procs. I don’t buy the argument of convenience of development or portability - it’s just laziness. It is a pain to debug apps when the environments are not integrated. I suggest you use parameterized dynamic SQL during development to get your SQL right and then create the stored proc based on that.

In some cases, when you are not the one who develops both the stored proc. and the calling code, you don’t necessarily want or need to know what the stored proc. is doing. If something goes wrong with the app, it should be pretty obvious if the problem is in the stored proc. or in your code. If not, then it’s either a problem in the design or the code that you can’t figure it out (of course, you are not always going to be looking at your own code or code that’s easy to maintain and understand).

Stored procs should be used to perform most database operations because it allows better structure. It enforces it. It’s basically the same idea of creating procedures/methods in procedural languages.

The main problems, as noted, are when you want complex data structures (objects) for parameters or variable number of parameters (such as a search function with many criteria).

In larger projects where there are many developers, using stored procs will make sure that it will remain easy for everyone in the team to be able to use and reuse the same stored procs. Also in larger projects, tracking and finding SQL statements in your code can be ugly and error prone.

Basically, I agree that there are some limitations of stored procedures that I wish didn’t exist, but overall, I believe stored procs. are much better for the purpose of designing and developing applications and if the performance is even marginally better than dynamic SQL, then it’s well worth it.

Stored Procedures are written in big iron database “languages” like PL/SQL (Oracle) or T-SQL (Microsoft).

Stored Procedures typically cannot be debugged in the same IDE you write your UI.

Naked SQL statements clumsily embedded in other languages use the same ‘big iron’ database languages. But wrapped up as text strings like many programmers do, they’re not debuggable at all, in any IDE.

Stored Procedures don’t provide much feedback when things go wrong.

Plenty of support for input validation and raising of errors exists. If you choose not to take advantage of it and then complain that it’s not being done…

Stored Procedures can’t pass objects

Create a class that formats an object’s properties as a stored procedure call, then use the class and forget for the rest of yout project’s development cycle that either SQL or stored procedures even exist.

Stored Procedures hide business logic.

Yes! N-tier design calls this a feature, not an issue. Your code isn’t supposed to know about or depend on the details of the stored procedure implementation. It’s just supposed to use the interface provided.

Stored procedures and triggers also allow for input validation, preventing a typographical or design error in other code from dumping erroneous data into the tables.

Stored procedures allow multiple sections of code to call the exact same steps for database interaction, and allows the developer to easily edit those steps and effect all dependent code at once.

Disallowing direct table access and allowing only stored procedures limits what can be done to the data to that fucntionality provided by and validated by sstored procedures. This disallows inappropriate or erroneous usage by authorized users, and severely limits what an unauthorized user can do, since they can’t even list the stored procedures that exist or the code behind them.

Even arguments that ‘it’s too much typing’ fail to hold water once you start taking into account having to surround the naked SQL calls with string quotes and line-continuation characters or string concatenation methods, factor in editing duplicated code in each place it occurs and the writing of additional client-side code to handle the input verification that could in most cases be more easily handled by the database server itself.

Hi, Jeff: You assert that “none of these things are true in practice,” but present no evidence to refute these points:

  • Stored procedures can be individually secured within the database. A client can be granted permissions to execute a stored procedure without having any permissions on the underlying tables.

  • Stored procedures result in easier maintenance because it is generally easier to modify a stored procedure than it is to change a hard-coded SQL statement within a deployed component.

These seem to me like valid reasons to use stored procedures.

Also, VS.NET does allow SQL Server stored procedures to be “debugged in the same IDE you write your UI.”

I read Jeff’s post last night and was going to comment this morning, but Wen and Eric covered pretty much all the points I wanted to. I am of the mindset that coming up with justifications for removing layers of abstraction in development is sheer laziness on the part of the architect; stored procedures are there for a reason (several actually)…not only for performance gains, but to separate business logic from (in what is often the case with web development) the presentation layer.

I’m actually astounded that the first point (re: “so-called languages are archaic”) was even mentioned: I don’t know about you guys but all the stored procedures I’ve written are in T-SQL on MS SQL, which is pretty much line for line the same code I would “inline” into the web application code, if I so chose to go that route.

We are in the midst of a 3 month project with another company who is responsible for maintaining a huge Lawson HR system, and transmitting job requistion and hiree data to and from our custom hiring application. Stored Procedures are practically a must in this scenario: Their lawson developer sends us the data and we deal with it as we need, sending it back and forth via stored procedures. I need to have less than a rudimentary understanding of what is going on in Lawson, and quite frankly, it’s not my responsibility with this project. I am responsible for handling the data on our end; that’s exactly why they have a Lawson developer…he has his role and I have mine.

Business logic hidden, data transported, end of story.

Stored procedures result in easier maintenance because it is generally easier to modify a stored procedure than it is to change a hard-coded SQL statement within a deployed component.

Maintaining our procs is a nightmare. We have a handful of guys who know PL/SQL and the Toad IDE well enough to actually edit the procs, much less debug a complex one. Things might be easier on the SQL Server side due to the MS integration, but until they get CLR in the engine (which I think is fantastic), it’s still Yet Another Layer we have to deal with and keep in sync, etc. This isn’t easier to maintain: it’s harder!

Stored procedures can be individually secured within the database. A client can be granted permissions to execute a stored procedure without having any permissions on the underlying tables.

Can’t you just set up database logins for those users that grant access appropriately?

Of course, as in all things: it depends. You may have a project where the “all stored procs” approach makes sense. What I really object to is this blanket (and INCORRECT) perception that procs are “just better” when my practical experience indicates they are WORSE on every project I’ve worked on. Even for performance! You need a really compelling reason to use procs-- critical performance junctions, remote vendor security lockdown, etc.

Naked SQL statements clumsily embedded in other languages use the same ‘big iron’ database languages. But wrapped up as text strings like many programmers do, they’re not debuggable at all, in any IDE.

Regular expressions are another language “clumsily embedded” in my code, yet it makes my life much easier. Go figure.

Yes! N-tier design calls this a feature, not an issue. Your code isn’t supposed to know about or depend on the details of the stored procedure implementation. It’s just supposed to use the interface provided.

The database layer is not the only place to do this. I would argue it’s one of the WORST places to do it because of all the reasons I discussed in my post. There are many many ways to isolate your data access layer. My current feeling is, if you REALLY care about building a reusable, abstracted API, then you need to be building a web service. End of story. That’s the only abstraction layer that, IMO, currently means anything. Everything else gives you the illusion of abstraction (remoting, stored procs) with few of the benefits. But once the web service is built, all my abstraction is taken care of-- I can call it from Mac, Dos, Unix, whatever, any platform, any language. It’s work, but you get REAL abstraction, a huge benefit. And the back end can do procs, parameterized sql, black magic, whatever. Who cares at that point, because it’s fully abstracted away.

Stored procs should be used to perform most database operations because it allows better structure. It enforces it. It’s basically the same idea of creating procedures/methods in procedural languages.

But again, are procs the only way to achieve this isolation? I don’t think so, and they also have some serious downsides.

I couldn’t agree more!

JavaKid

Jeff,

I would agree with you in the case where a database will only be used by a single application being developed by a single programmer for a single customer because all of the available knowledge is in one brain and there is only one skillset to work with. However, as soon as you start adding programs, adding programmers, or adding customers I switch to being firmly in the camp that says “all interaction with the database will be through stored procedures” and here is why:

  1. As a rule VB programmers are terribly inefficient T-SQL coders. They are row-based thinkers, not set-based thinkers, and that leads to very slow code. I spend a lot of my time “fixing” code written by VB programmers that “works” but takes forever. I just finished rewriting an app that used to run overnight or over a weekend (depending on the options selected) that now runs in 15 - 20 minutes. The efficiency did not come because of parameterized queries or cached queries, it resulted from using set logic on the server instead of executing hundreds of thousands of queries from the client and processing the logic on the client row by row.

  2. Anyone that has had to research a proposed schema change will tell you what a nightmare it is to identify all of the code that could be affected by the proposed change. How do you know if you have found all of the code? Do you just wait and see what breaks after the change is made? Requiring that all interaction with the database be done through stored procedures makes it easy to identify the stored procedures that will be affected.

  3. Stored procedures can be designed, written, and tested separately from the VB code. The VB programmer and the database programmer reach an agreement on what the parameters will be and what the output will be and they both write their code in accordance with that “contract”. This allows for parallel development and independent testing.

In closing I would like to throw in a couple of other general observations:

  1. You probably would not want a contractor to build your house for you if they only used carpenters. It is true that the majority of the work should be done by carpenters, but you would probably want some electricians to do the wiring and some plumbers to handle the pipes and toilets. Does that mean that carpenters couldn’t build a house without electricians or plumbers? No. Would you want to buy that house? Probably not.

  2. I have found that most problems with performance have their root cause in poor database design. Spending more time upfront thinking through the design and understanding how it will be used will give you more performance gain than parameterized queries vs. dynamic queries.

it resulted from using set logic on the server instead of executing hundreds of thousands of queries from the client and processing the logic on the client row by row

That just sounds like bad coding, pure and simple. Hundreds of thousands of queries? How could that ever be a good idea?

Anyone that has had to research a proposed schema change will tell you what a nightmare it is to identify all of the code that could be affected by the proposed change.

An illusion, as we have had table changes which broke dozens of stored procedures. It’s the same problem, just on a different (and harder to debug) tier. Reality is, changing the DB will break everyone, which is why I favor total transparency whenever possible.

Stored procedures can be designed, written, and tested separately from the VB code.

So can API layers. Stored procs are not the only form of database abstraction, just one of the more limited ones.

Does that mean that carpenters couldn’t build a house without electricians or plumbers? No. Would you want to buy that house? Probably not.

I agree you need developers that understand basic relational database design. I have worked with developers (I am looking at you, Shawn Holmes) who didn’t even index their tables.

I have found that most problems with performance have their root cause in poor database design. Spending more time upfront thinking through the design and understanding how it will be used will give you more performance gain than parameterized queries vs. dynamic queries

I agree. One of the advantages of parameterized SQL is that it provides much greater transparency into the database. It’s a lot easier to hide a bad database design by sticking it behind a bunch of stored proc “interfaces”. The DB schema should ALWAYS be well designed, and the more eyes you have on it, the more likely it is to be well designed. If 4-5 developers can’t figure out the crazy queries necessary to do basic business work in your schema, maybe it isn’t very good?

Jeff,

“That just sounds like bad coding, pure and simple. Hundreds of thousands of queries? How could that ever be a good idea?”

From the VB programmer’s point of view it seemed like a good idea. The purpose of the application was to determine the eligibility of thousands of people for health care benefits based on a complex set of rules based on number of hours worked, hire date, where they worked, when they worked, and other personal information. The programmer executed a series of queries to get the necessary information for one worker for one time period. They then used that information to calculate whether or not that person had earned eligibility and executed an update query to save the eligibility information. They put that block of logic inside 2 loops - one to iterate through all of the workers and one to iterate through all of the time periods for each worker. To the VB programmer that made perfect sense - it was just like using arrays. To a database programmer that is insane, but to a VB programmer it made perfect sense.

“An illusion, as we have had table changes which broke dozens of stored procedures. It’s the same problem, just on a different (and harder to debug) tier. Reality is, changing the DB will break everyone, which is why I favor total transparency whenever possible.”

You seem to have missed my point. If all of the code referencing a table is contained in stored procedures I can find all of the affected stored procedures by doing one query against the syscomments table (where all stored procedure source code is stored by SQL Server). I do not know of any way that I can find all of the VB code that may exist that references a table because there is not a single known place where all of the source code resides. The issue is not how many pieces of code will be affected, the issue is how do I find all of the affected pieces of code?

"Stored procedures can be designed, written, and tested separately from the VB code.

So can API layers. Stored procs are not the only form of database abstraction, just one of the more limited ones."

Once again I think you missed my point. Whether you execute queries directly from VB or you put them into stored procedures the queries themselves must be tested. I believe that it is better (and easier) to write and test stored procedures separately so that when you are testing and debugging your VB code you have eliminated the dynamic SQL as a possible source of problems.

“It’s a lot easier to hide a bad database design by sticking it behind a bunch of stored proc “interfaces”. The DB schema should ALWAYS be well designed, and the more eyes you have on it, the more likely it is to be well designed. If 4-5 developers can’t figure out the crazy queries necessary to do basic business work in your schema, maybe it isn’t very good?”

In discussing the pros and cons of using stored procedures vs. dynamic queries from VB we should both be assuming the presence of skilled programmers on both the VB side and the database side. It is somewhat disingenuous to argue that you need to put the queries in the VB code because, if you don’t, the database people will write a bunch of crappy code and stick it in stored procedures that you can’t understand and are stuck with. It would be equally wrong for me to argue that VB programmers should be forced to use stored procedures because if they are allowed to write queries against the database they will be very inefficient and bog down the database server or write update queries that screw-up the data. Even though that may be true given the talent levels on your projects, the philosophical argument should assume that both sides have talented people.

The question is, if given equally talented people on both sides, what is the best way to work with a database and why?

Think about accessing the database through stored procedures (only) as a means to using your database like a service. It provides whatever (and only) the functionality implemented by your stored procedures, regardless of what application is accessing it. It you have one application and 4 import (applications) all using the database, stored procedures will help ensure that they all do it in a consistent way. New applications will already have a good baseline of database logic to choose from. You database becomes more like an entity, or single object that has specific functionality available.

Change the word “database” to “web service” and I agree. That’s a meaningful layer of abstraction for apps that need a formal API layer (note that not all do).

If you’ve never tried stored procedures, you owe it to yourself to do so and formulate an opinion based on your experience

Oh, believe me, my opinion is based on extensive use of stored procs in many projects. They are the source of much continuing pain.

Check out the latest proc “submitted by Jeff Atwood” at
http://www.thedailywtf.com/ShowPost.aspx?PostID=29337

When you have worked on a database that has over 2 TB of data, and literally more than 3k tables/views you will understand not only the power, but NECESSITY of them

Sure, YMMV. What I object to most is the argument that stored procs must ALWAYS be used, no matter what, which is a pretty giant net negative.

and the plan analyzer shows that it’s taking 19+ seconds trying to figure out what to do with it, you know it’s time for an SP. Actually, up-front would have been the time

What database are you using? As mentioned in the post above, parameterized SQL query plans are automatically cached by most modern database systems. Fex, in SQL Server as of version 7.0 or greater. So the query plan will only be calculated the first time the query is sent; all subsequent submissions of that same parameterized SQL will be looked up in the query plan cache.

When you have 70+ projects sitting on one giant shared database + schema you simply cannot have developers mucking about directly with tables or views unless ABSOLUTELY necessary

Again, are stored procs the only way to accomplish this goal? If you want an API that really means something, I’d put a Web Service on top of that database and disable access from any port other than port 80.