Deadlocked! - blog - Coding Horror Discussion

August 2008

WileC

As many people have already pointed out, the advantage of nolock is higher performance, the risk is the possibility of reading uncommitted data.

Prior to implementing with (nolock) or Set Transaction Isolation Level Read Uncommitted, you should ask 2 questions:

What (bad things) can happen if my application reads dirty data?
How often will a dirty read occur?

For most applications (except banking, airline reservations, etc) the answer to #1 is nothing and the answer to #2 is hardly ever.

August 2008

This is probably WAY too far down for anyone to read it now, but another possibility is that it ISN’T a deadlock - the message just SAYS that it is. Maybe (deep down in the SQL stack) an exception is thrown, and something slightly higher up just ASSUMES that all exceptions thrown by x() are deadlocks, and spits out the message.

August 2008

Jasmine

No Syd - SQL Server does not falsely report a deadlock. As I mentioned earlier, it is a very specific condition and simple to identify, but impossible to solve. So it kills one of the queries. If SQL Server says you have a deadlock, that’s what it is.

August 2008

Mladen

oh and Jeff, this line:
With nolock / read uncommitted / dirty reads, data may be out of date at the time you read it, but it’s never wrong or garbled or corrupted in a way that will crash you.
is plain wrong.

read this to see why:
http://blogs.msdn.com/craigfr/archive/2007/06/12/query-failure-with-read-uncommitted.aspx

August 2008

DMR

So… why are you not just using MySQL ???

August 2008

LeighR

Just for the record, unlike SQL Server and MySQL, Oracle does not have a READ UNCOMMITTED isolation level. As Tom Kyte says, The goal of providing a dirty read is to supply a nonblocking read, whereby queries are not blocked by, and do not block, updates of the same data. However, Oracle Database doesn’t need dirty reads to achieve this goal, nor does it support them. Dirty reads are an implementation other databases must use to provide nonblocking reads…

August 2008

Puneet

Simplest case :

tran1 :
begin tran
select from [A]
update [B]
commit tran

tran2 :
begin tran
select from [B]
update [A]
commit tran

this would deadlock if you happen to enter into these two simultaneously.

I suspect that the framework you are using is intiating transactions even if you are not doing so explicitly. I ran into this when I created a middle layer using .Net and the end users (developers) were facing this even though they were not using transactions for their SQL sets.

I say this because just deadlocking on selects v/s updates is very rare (if at all) and certainly not in the vicinity of the rate that you are experiencing.

August 2008

Steve

Scoble is a moron

August 2008

CT2

Not sure if this has already been submitted above (I didn’t read each post in that much detail). You could be seeing an index deadlock. While deadlocks between tables is easier to understand you can get into the same situations with indexes as well.

For example lets say you have to indexes A and B. Lets say that A is a clustered index on Id so it contains all the data (lets say B is an index on ParentResultId and LastActivityDate).
So the select statement starts with index B, puts a read lock on it and then tries to go get the rest of the data from the data page (trying to get a read lock on the page). The update statement does the opposite, first it gets a lock on the data page (index A) and then tries to get a lock on index B so it can update it. Classic deadlock.

Not sure if this is what is happening here but it is a less straight forward version of deadlock that sometimes you miss.

Also I think the claims above of two select statements deadlocking is impossible. You need some process requesting exclusive access to a data page to get a deadlock. Any number of connections can hold read locks on the same page which makes sense because one process reading the data shouldn’t prevent another process from reading data.

August 2008

Charlie

I’d say that the fact that SQL Server deadlocks has twice the results amount versus MySql deadlocks is because the amount of users that SQL Server has much more than MySql.

August 2008

mgb27

The stupid question was why should my database need to lock on a read. If calling that out flips the bozo bit, I’m happier without interviewing with your organization.

August 2008

mgb28

To elaborate slightly: Everything works this way. Try to get a read lock on a file locked for writing. By default you can’t. Try to get a read lock on a semaphore-controlled variable. By default you can’t. Seeking a read lock by default before reading is one of the most common patterns of concurrency that exists.

August 2008

JamesY

I bet you do not have an index on [Posts].[ParentId]. I would also not be surprised at all to find that adding that index fixed your problem.

August 2008

someguy2

Personally I’m pretty darn happy that Sql Server has good defaults that are concerned with data integrity and doesn’t hope programmers will know what to do. Maybe you’re happy otherwise, but people would scream bloody murder if it was otherwise.

Automatic retries … you really don’t want that do you? Retry it yourself. If you had the system software do it for you, it most certainly isn’t going to make the correct choice. Additionally, should every layer retry between your app and the database? What if n classes wait rand(m) seconds between p retries? You press a button and the app goes away for 5 minutes because a failure takes n * m * p seconds to get back to you? No thanks. Fail fast and allow the application to retry if it can at the appropriate level. As long as you have concurrency detection mechanisms in place then there should be nothing wrong with retrying the write.

From an application standpoint, read-committed locks are only necessary if you plan to update the database using that data, because you must first see a snapshot the data you need to change, and then change it. If you’re reading just to grab replies to a post, then you should be able to use nolock. Make sure your code can handle the case when the count is not equal to what was returned.

Also to avoid locking, you could read from views and you can partition the hot data out of the table (ie statistics, counts, etc) so you’re not locking pages on a heavily accessed table to update a count that is, for most purposes, probably useless.

But is this a premature optimization? Is your goal of storing the count trying to save you time, but in effect is causing the deadlocks and slowing you down? The statistics are pretty simple, and I have to ask, why aren’t you using a view? It’s what they’re meant for; let Sql Server work it out.

Sounds like you also need some application-level caching. While it’s good to optimize the lower layers, you should be preventing every request from hitting the database without reason. Even a light cache is better than nothing. e.g. if you’re getting 1000 hits a minute and the data is not expected to change in that time range, even a 30 or 60 second cache lifetime will save you 500 to 1000 roundtrips to the database during that cache period. That’s something to think about.

Writing software is easy. Writing good software isn’t.

August 2008

John

A great read on isolation levels, concurrency, and locking:

http://www.oracle.com/technology/oramag/oracle/05-nov/o65asktom.html

While Oracle centric, this article is not necessarily Oracle specific. It compares how Oracle performs (or avoids) locking using multi-versioning to other databases. Detailed examples are provided.

Actually - I think this article is even more relevant to developers NOT using Oracle, since with Oracle, deadlocks are rare. In any case, understanding the basics can go a long way to solving application issues.

Regarding SQL Server vs MySQL deadlocks, doesn’t MySQL use some form of multi-versioning?

August 2008

langel

wow + suck

August 2008

MikeW

I think there’s a good reason that Microsoft put SQL Server where it is on the Mistic Scale (opti-pessi) - it’s better to design to support the high-risk environment at the expense of the low-risk one than vice versa. Since MS can’t know how smart their clients’ DBAs are (and I’ve met some donkeys responsible for databases running billions of dollars’ worth of trading positions - even been one myself) they make it ultra-pessimistic out of the box.

Better financially, say, that stackoverflow.com gets some tricky deadlocks than a trader makes the wrong decision because he doesn’t have full-on query consistency. Especially if the only problem is a configuration option.

Of course, MS should be providing better out-of-the-box configuration options - reasonable safe setups for several common types of application, with explanations of the pros and cons of each, would be a start.

But simplicity of installation and configuration is something Microsoft often get wrong. Rick Brewster’s exposition of Paint.Net’s worsdt-case installation scenario is a case in point.

August 2008

dave12

Why doesn’t each philosopher take the chopstick to his immediate left, break it in half, and use two small chopsticks?

August 2008

AndrewT

We use NOLOCK a fair bit here for reporting - given that our production systems are doing quite a few writes per second… that said, I’ve found good indexing and using views to normalise data, rather than using normalised tables is a good way to go.

You can also throw the nolock hint inside the view - and for a few situations like you describe, it might be better to use a view to cache a users last update, x answers, etc into its own table rather than keeping in a denormalised structure - that way you aren’t doing the reads directly on the tables.

Also, for any detailed stored procs, etc, I’d go about using the WITH keyword and use CTEs rather than doing any temp/in memory/@table tables.

I’ve found that they speed my queries up to no end.

Also, remember when using Transaction scope in LINQ, you have to wrap the first call to the data inside your transaction USING, otherwise the call to get the data is fired outside of the transaction scope:
http://www.madprops.org/blog/linq-to-sql-and-nolock-hints/

Looking forward to getting my sign up approved for the site!

August 2008

Craig_Dunn

But in practice adding nolock to queries that you absolutely know are simple, straightforward read-only affairs never seems to lead to problems.
With nolock / read uncommitted / dirty reads, data may be out of date at the time you read it, but it’s never wrong or garbled or corrupted in a way that will crash you

That hasn’t been true in our experience: although infrequent, we do get the following error:

[SqlException] Could not continue scan with NOLOCK due to data movement.

It’s an Exception - the query fails and the user sees an error. So NOLOCK can still break your read query and give the user no result rather than just an out-of-date one.

NOLOCK may still be useful - but it’s not guaranteed to work every time…

August 2008

Jay

Views are just stored SQL. Given that a view will not change a deadlock situation.

August 2008

Nick_Sivo

Jeff,

Can you please show us some query plans so we can offer meaningful advice? SQL server maintains a cache of the plans for all parameterized queries by default.

I’d give instructions, but I’m not at my computer right now.

I’ve run into some hairy stuff at Loopt, and might be able to at least explain why it’s happening, if I can’t offer a real solution. I view NOLOCK as a hack and successfully eliminated it from all our DB code.

August 2008

RonaldP

Well, Microsoft is based on Sybase, which I know used to default to dirty reads (this means that for instance, your bank statement could show a transaction as having occurred, but the balance from before the transaction occurred). It sounds like MS went and bandaided this behavior, with the results you see.

Try something that has always done the right thing, such as Oracle or Postgress. Reads never block (and thus never deadlock).

August 2008

Andrew

Seriously you don’t know why you need a lock when reading a database which conflicts with an update??? And you are blogging about it???

Say you are taking orders for the dining philosophers. Let’s say philosopher B wants what philosopher B is having - you need to ensure that philosopher A’s order is complete before philosopher B gets to read it.

August 2008

RickT

Hey Jeff, in response to

I get that, but I don’t understand why simple read SELECTS need to lock at all. I mean, maybe if that database was storing my bank account balance, in which case transactional orders are hugely important.

You should consider two items:

Firstly, Read locks are much looser than other lock types (e.g.: Update), and won’t usually cause blocking, let alone deadlocks. Several comments have suggested good articles on locking, and I’d recommend you read SQL Server Books Online (a misnomer, as you can install them locally.)

Secondly, if you read dirty data, the risk you run is of reading the entirely wrong row. For example, if your select reads an index to find your row, then the update changes the location of the rows (e.g.: due to a page split or an update to the clustered index), when your select goes to read the actual data row, it’s either no longer there, or a different row altogether! How would it appear to your users if every time someone else added a comment on your site, the comments they retrieved came from other conversation threads? That’s what Read locks prevent; they make your Read atomic, so that while following indexes, retrieving pages, and performing joins, you don’t have invalid data creeping in (or valid data creeping out) of your results.

As to the root of your problem, I would suggest that either you are creating explicit transactions that you didn’t tell us about, or that you need to turn off implicit transactions for LINQ. And while I always enjoy reading your blog, I’m afraid the old A poor workman blames his tools quote is rather appropriate here. If you really are getting deadlocks with simple SQL statements and no transactions, then your code is going to perform terribly as usage scales up. No amount of (NOLOCK) or Read Uncommitted will fix bad code.

Always entertaining, and often enlightening. Thank Jeff, and good luck!

August 2008

RickT

Ronald Pottol:
Try something that has always done the right thing, such as Oracle or Postgress. Reads never block (and thus never deadlock).

Oracle accomplishes this with snapshotting, which SQL Server has supported for a few years. It’s off by default because it’s expensive both in terms of both CPU and Memory. Essentially, your RDMS is managing multiple versions of the Truth, which is what we try to normalize away from.

And there are ways to get deadlocks with reads Oracle, depending on your transactions and locking levels. I know, I’ve done it. I’ve never used Postgres, so will defer to you on that one.

August 2008

AdamT

The reason for the deadlock in this case is straightforward, if my obvious-seeming assumptions are correct.

I’ll bet that the rendered page shows the # of answers followed by the answers themselves. Any non-sloppy developer would try to make the numbers match. The obvious thing to do is put the SELECT of the question in the same transaction as the SELECT of the answers. That’s the source of the problem.

Suppose I post an answer to a question that you are currently loading. You SELECT the parent [Post], taking a shared lock on the row. I INSERT my answer, taking an exclusive lock on the new child row. I try to update the number of answers to the question, but you have a lock, so I’m blocking on you. You start SELECTing all child [Posts], get to my new answer and block because I have an exclusive lock. You wouldn’t want to read a record that may get rolled back. That wouldn’t be stale data, it would be non-existent data! So, you’re blocking on me. Now we have a deadlock. I’ve assumed the [Posts] table is a tree. The problem still exists if it’s not.

MVCC would fix this particular problem. You’ve never see my INSERTed, but uncommitted record, so your SELECT would finish, you’d release the shared lock on the question, and then I’d UPDATE it and commit, making my new record visible at the same instant that my UPDATEd answer count takes effect.

Stale data isn’t so bad in certain specific cases, but using non-existent data (that got rolled back) is just amateurish. READ UNCOMMITTED is for suckers. Maybe your SELECTs don’t need to take shared locks, but, for heaven’s sake, don’t ignore the exclusive write locks. Or switch to MVCC. I don’t know how much of this is possible on SqlServer. It was news to me that it doesn’t always use MVCC.

August 2008

S114

Jeff, you appear to have forgotten transactions are independent. SQL Server will, by default, do its utmost to preserve transaction independence (anything less is simply unacceptable.) What this means is that the world is frozen when the transaction begins, looking around at this frozen world nothing will change exept what you move.

Working without locks is okay so long as you know what it means. Working at read uncomitted means that if you go begin transaction, select from table, select again from same table’ the second select may have different results than the first. Whether this is a problem is up to you; but SQL Server has to assume (by default) it might be so by default it is more cautious.

It’s not ‘silly’ that SQL Server treats your website as a banking operation; personally I would expect nothing less. Correctness is a vastly more important than performance. Only the developer can trade off correctness for performance; and it is rarely needed. I suspect you’d be the first complain about Microsoft’s apparent shoddy coding when running the ‘count of comments’ query and the ‘text of comments query’ in the same transaction came up with different numbers!

As to why select locks records; if it didn’t then the world wouldn’t frozen. Your program logic may behave differently if the changes were visible (there’s now an extra comment and you have a 'no more comments more than 500 rule or something) or, as I mentioned, you do a second query and it gets a different result.

August 2008

MGB_stinks

mgb said, It’s always a little disturbing to see a well-known coder ask a dumb question, but come on, database locks? This is very basic stuff!

Shut up. Seriously. What a techy-snob! You’d get negative 10,000 on stackoverflow.com. It’s because of folks like you that others have a hard time asking questions. You should know better. Tool.

August 2008

John

And there are ways to get deadlocks with reads Oracle, depending on your transactions and
locking levels. I know, I’ve done it. I’ve never used Postgres, so will defer to you on that
one.

I’m pretty sure this is impossible. With Oracle, writes cannot block reads, and reads cannot block writes.

In my experience with Oracle, almost all deadlocks are deletes in a parent table were a child table has an unindexed foreign key to the parent (a full table lock is required in this case.) This is of course an easy thing to fix - just add an index.

a href=http://asktom.oracle.com/tkyte/unindex/index.htmlhttp://asktom.oracle.com/tkyte/unindex/index.html/a

John

August 2008

Claude

Lot’s of chatter over a simple SQL keyword that folks should be aware of and using if they use SQL regularly.

This is kind of like saying I know the framework will garbage collect, but things work better when I dispose of my own trash.

August 2008

Mitch

I had a very similar problem - except I was reading a large amount of data from a few tables to create a report. And on another webpage, data was being inserted into the table, but the insertion was chosen as the deadlock victim.

I think a sql server deadlock support group is needed! lol

August 2008

JohnS

I know this isn’t stack overflow - but I thought I’d point to the correct answer in this thread.

Robert G points to the trace flags described in http://support.microsoft.com/kb/832524 that give you more than enough detail on the deadlock participants.

Having gone through several of these before, this deadlock information typically highlights an application call (or rather a pair of simultaneous calls) that are not as precise as they could be - very often causing unnecessary lock escalation.

Query hints or application-level retries are bandaid solutions to address the symptoms, not the problem.

Hope that helps,

John

August 2008

OracleGuy

Why do you need an AnswerCount field? Just compute it and avoid the possibility that it contains the wrong value. My suggestion is to show your schema to a good data modeler. Focus on your app code that touches anything the modeler circles in red.
The root cause is not using Oracle. If you are using Oracle and still get a deadlock, you have an unindexed foreign key and failed to heed the warnings in TOAD or OEM. I only need one hand to count the number of deadlocks I’ve encountered in a decade of using Oracle. All had obvious causes and were discovered early in development.
Anybody recommending nolocks should be on an IT no-hire list. You are making bugs. Why not simply store all your data in ASCII flat files and get your wrong answer even faster?

August 2008

Chris

Anybody recommending nolocks should be on an IT no-hire list. You are making bugs. Why not simply store all your data in ASCII flat files and get your wrong answer even faster?

Funny you should bring that up. I was wondering if a DB was even needed in a design like this. Sure, you need a durable store but do you really need a full ACID guarantee? You could go a long way with a simple paxos implementation rather than a full DB.

insert argument about using a sledgehammer to drive a thumb tack here

August 2008

otherguy

@Mitch

SET DEADLOCK_PRIORITY

from (http://msdn.microsoft.com/en-us/library/ms186736.aspx)
Which session is chosen as the deadlock victim depends on each session’s deadlock priority:

If both sessions have the same deadlock priority, the instance of SQL Server chooses the session that is less expensive to roll back as the deadlock victim. For example, if both sessions have set their deadlock priority to HIGH, the instance will choose as a victim the session it estimates is less costly to roll back.
If the sessions have different deadlock priorities, the session with the lowest deadlock priority is chosen as the deadlock victim.

And from (http://msdn.microsoft.com/en-us/library/ms178104.aspx)
Resources that can deadlock: Locks, worker threads, memory, parallel execution-related resources, and Multiple Active Result Sets (MARS)

There’s a plethora of msdn articles on minimising and preventing deadlocks.

August 2008

otherguy

@OracleGuy - you’re a zealot. The nolocks hint is there for a reason, not just some bandaid for idiots. The only reason for a ‘no-hire’ is if someone can’t see any possible danger of using nolock and prefers it ‘for speed’.

I’ll concede that if you’re using nolock often, then it’s a bad smell indicator that you’re doing something wrong.

August 2008

Kevin

first post!

August 2008

Tom_Willis

I just went through this on a system. Row level locking for updates may also be worth looking into for your situation as well.

August 2008

mph

A Google search for [sql server tables] returns 216 times as many hits as [mysql tables]. A search for [sql server database] returns mroe than 6 times as many hits as [mysql database]. I’m not sure that I would draw any pro-MySQL conclusions from the statistics you present.

August 2008

Kris

You could try caching (if not already) to eliminate some of the database reads, assuming that it’s not dynamically changing data.

I’ve never ran into this sort of issue and I have some database heavy sites… apparently not heavy enough though.

August 2008

SoMoS

As mph said retaled to the fact:

I think it’s incredibly telling that a Google search for SQL Server deadlocks returns nearly twice the results of a query for MySql deadlocks

you will need how many deadlocks entry per user you find at Google, not the overall.

August 2008

o_s57

Great post jeff and thanks for the explanation on the blogging frequency; I was becoming a little antsy about it. I understand now. I’ve seen this exact problem on production systems and I had no idea what was wrong, now I at least have a chance to fix it.

August 2008

James_B

You’ve probably got a highly normalised database and a long running SELECT, which is backing up the UPDATE, which, in turn, is backing up the other random SELECTS.

By the time you get around to looking at the queries, the long running guy is finished, the UPDATE is more important and your subsequent SELECT gets killed.

It’s a common DB design problem.

Take a look at the query log and see if you can find the root cause SELECT.

August 2008

BobMolb

Your drawing shows something like 6 philosophers and a guy that is maybe serving. There is not enough chopstick. I’m afraid but you might not be able to draw or choose good drawing anymore after Sql Server battle

I already read article where you where complaining about Sql Server. Do you think Mysql won’t be any better ?

And yes i saw that you had fewer time to blog.

good luck

August 2008

Kyle

Why not post it as a question on StackOverflow, Jeff?

August 2008

KenH

A couple of days ago, I watched a talk on the University of Washington public access channel about some research into resource locking. This talk is specifically about resource locking in the context of multi-core processors, but it applies to the whole class of problems just as readily. The solution proposed by this professor (Dan Grossman) is to build the handling of resource locking into the compiler, the same way that garbage collect has been integrated into comilers over the last 15-20 years (do you remember the days before garbage collection? Blech!)

Jeff, I know this doesn’t help you guys now, but it is food for thought.

Summary from the UW website:

With multicore processors bringing parallel computing to the masses, there is an urgent need to make concurrent programming easier. Software transactions hold great promise for simplifying shared-memory concurrency, and they have received enormous attention from the research community in the last couple years. This talk will provide an overview of work done at the University of Washington to help bring transactions to the next generation of programming languages.

The program is available for download at http://www.uwtv.org/programs/displayevent.aspx?rID=22341fID=1471.

August 2008

ceejayoz

@mph - That’s hardly fair. MySQL is itself a SQL server.

August 2008

mgb29

It’s always a little disturbing to see a well-known coder ask a dumb question, but come on, database locks? This is very basic stuff! RDBMSs, in particular, are intended to preserve data integrity in addition to enabling fast read/write access to large data collections. Otherwise file systems could be made sufficient.

In your case, since your innocuous SQL statement doesn’t grab the parent along with the child posts, you could easily have a difference between the answer count and the # of children actually returned. Of course it’s not a huge problem that you should spend sleepless nights on, but it will lead to a bug that is just shy of impossible to reproduce or debug.

August 2008

stussy

Wasn’t the diner a drive system for some ship in one of the Hitchhiker’s Guide to the Galaxy books?

August 2008

Noah_Yetter

I can’t really think of a justification for the out-of-the-box behavior of SQL Server as you described. As far as I know Oracle always uses snapshots for reading, so a simple SELECT can never cause a deadlock.

Come to think of it, why are you using SQL Server at all, and not something free (beer or speech, take your pick)?

August 2008

Ira

Nolock is probably a reasonable solution in this case, but I’d like to point out that a deadlock can only happen when multiple resources (usually tables) are involved in simultaneous transactions. If you’ve got a SELECT that operates on tables A and B and an UPDATE that operates on the same ones, then the SELECT will grab a lock on table A, the UPDATE will grab table B, and then both will sit there while they try to get a lock on the other table. Or if you’ve got a block of statements in a single transaction, you’ll get the same effect.

If your app is denormalized to the point where you only need a single table for most queries (as many simple web apps are), you shouldn’t see deadlocks. You may want to check out your transactions and make sure they’re not doing more than necessary within a transactional block.

August 2008

mph

ceejayoz: So bitch at Jeff, since I modeled my queries on his. If my sql server queries are contaminated with MySQL results, then so are his, and my point stands.

August 2008

Damien_Katz

You could use CouchDB, which uses optimistic commits and never ever deadlocks.

Just sayin.

August 2008

Jason

The issue is probably going to come down to design. Is the Id/ParentId a GUID and does the table have a lot of rows? Is that select resemble the real select that is happening? Do you have indexes including a clustered? Are you sure the deadlock is caused by the select? Are you issuing the select and update in different connections?

The idea that MySQL has less deadlock issues then SQL Server based on a google search returning less results does not tell much. SQL Server is used for many more real applications then SQL Server is what it tells me.

August 2008

Mladen

Hey, Jeff.
If you haven’t found the exact reason why your deadlocks are occuring maybe you should try this method:
http://weblogs.sqlteam.com/mladenp/archive/2008/07/18/Immediate-deadlock-notifications-without-changing-existing-code.aspx

this way you don’t have to worry about running the profiler or anything else. when the deadlock happens you get the entire deadlock graph saved to a table and get a notification my mail if that’s what you wish.

This method has helped me trace a few bastardly deadlocks with great success.

hope it helps.

August 2008

Grant

If your database allows you to set the isolation level as serializable, and supports MVCC, you can gain all of the safety you would need for a full banking application without getting any of the locking problems.

If the update that updates the count was still running, as long as you were careful to have the actual update of the comment in the same transaction, it would always be in sync, the comment just would not appear until the count had been updated.

A better question, however (Cobb would be proud) is why in the world are you storing the count of the replies on the post record??? The last activity and count are easily calculated using aggregate SQL functions.

There are, in effect, two issues.

Your database either cannot support concurrent use well, or it is not configured correctly.
You schema design looks to have some issues.

August 2008

Hoffmann

I’m no DBA, but this feels like a hack and hacks usually lead to headaches later on.

August 2008

DennisG

I’m a big fan of using with (nolock) hint by default.
That doesn’t mean that all queries should be written with nolock, but most of SQL queries should.
Especially for web app.

In some situations you want to be sure that your data is consistent no matter what. In this case you should use different locking hints. Which ones – would heavily depend on the situation.
You also need to clearly understand what exactly locking does.
The locking model is relatively simple, the problem is that most of locking tutorials are just terrible.

I wrote the tutorial about how to learn about SQL Server Locking:
http://developmenttips.blogspot.com/2008/08/cure-for-deadlocked-learning-to-use.html

August 2008

daghf

I recommend this episode of software engineering radio:

http://se-radio.net/podcast/2008-05/episode-99-transactions

This episode takes a close look at transactions from different angles, starting with their fundamental properties of Atomicity, Consistency, Isolation, Durability but also investigating advanced topics like distributed or business transactions.

August 2008

codinghorror

If you’ve got a SELECT that operates on tables A and B and an UPDATE that operates on the same ones, then the SELECT will grab a lock on table A, the UPDATE will grab table B, and then both will sit there while they try to get a lock on the other table. Or if you’ve got a block of statements in a single transaction, you’ll get the same effect.

I get that, but I don’t understand why simple read SELECTS need to lock at all. I mean, maybe if that database was storing my bank account balance, in which case transactional orders are hugely important.

I guess what I’m saying is that Starbucks Doesn’t Use Two-Phase Commit

http://www.eaipatterns.com/ramblings/18_starbucks.html

August 2008

JC111

There are a couple of things you might want to look into:

First of all SQL Server locks pages by default (not rows) which makes deadlocks more likely if the offending rows lie on the same page
One aproach that migh help, is actually follow the advice from the error message, and retry the transaction.

August 2008

DennisG

That’s ok to store number of replies for every post.
But instead of storing it into Post table, store number of replies to PostCommentCount table.
(PostId uniqueidentifier, CommentCount int)

One record in Post table would match with one or zero records in PostCommentCount table.

Another problem with locks in SQL Server is that typically SQL Server locks not just one record, but whole Page or even whole table.

P.S.: I’m really missing ability to edit my comments (or at least delete them so I can replace them with corrected ones).

August 2008

codinghorror

is actually follow the advice from the error message, and retry the transaction.

OK, but why doesn’t SQL Server re-issue the transaction on my behalf, then? Automatic retries of some kind?

August 2008

Josh_Hurley

Jeff:

I agree with Hoffmann, this sounds like a hack. I deployed dozens of enterprise level .NET/SQL and Classic ASP/SQL applications and have not come across this issue.

It sounds like a transaction is being created on the call to the database, either by LINQ or in code, and it not being released. I would pose the question to Scott Guthrie, I bet he can point to one of the LINQ developers that will shed more light on this subject.

Josh.

August 2008

Jasmine

I think you misunderstand what a deadlock really is. A deadlock is not a lock that is taking a long time. It is a lock that will never be released until one of the two processes is killed. Here’s what happens in a deadlock:

Query 1 begins by locking resource A
Query 2 begins by locking resource B
Query 1, in order to continue, needs a lock on resource B, but Query 2 is locking that resource, so Query 1 starts waiting for it to release
In the meantime, Query 2 tries to finish, but it needs a lock on resource A in order to finish, but it can’t get that because Query 1 has the lock on that.

So, you have two queries fighting over SEPARATE objects, not fighting over the same object (in which case one query simply has to wait). That’s the important part - these queries will never finish. This is quite different from something that is simply taking a long time. There is no solution to the deadlock race condition other than to kill one of the processes.

For this reason, I would suggest focusing only on queries which involve multiple rows or multiple tables. Single-row, single-table queries, such as the one in your example, can never be deadlocked. They can be waiting on a locked resource which is involved in a deadlock condition involved two further queries, but in this case, killing the single-lock query still won’t fix the problem.

I do not advise using the NOLOCK hint, but you are correct that it can’t really hurt anything on website reads. Someone will see out of date information, but in most cases, that should be ok.

August 2008

Jasmine

SQL Server does not re-issue your command because the command itself could be dependent on the data changes that were being made by the query that wasn’t killed. This gives the client application a chance to run the transaction again, including any reads which may need to be done to data that was changed and is now in a consistent state.

August 2008

Tae_Kim

SQL Server 2005 sucks but you only find out after you’ve got several millions records and it’s too late to migrate off. I’ve lost all faith in SQL Server as an enterprise DB when performance started going in the toilets with heavy reads despite the fact that the behemoth it’s running on has almost no disk and CPU activity.

August 2008

Tae_Kim

SQL Server 2005 sucks but you only find out after you’ve got several millions records and it’s too late to migrate off. I’ve lost all faith in SQL Server as an enterprise DB when performance started going in the toilets with heavy reads despite the fact that the behemoth it’s running on has almost no disk and CPU activity.

August 2008

TM119

The Stack Overflow beta started already?! How did I miss it? I hope I can still get in. hurries to send a sign up request

August 2008

Joel_F

I get that, but I don’t understand why simple read SELECTS need to lock at all. I mean, maybe if that database was storing my bank account balance, in which case transactional orders are hugely important.

The description of the pessimistic behavior and the description of ‘read committed snapshot’ make the latter sound like multi-version concurrency. Near the beginning of the post, when you said that a SELECT was part of the problem, I was thinking, Why? If it was using MVC then a SELECT wouldn’t even lock.

Then you got to the part about snapshots. It seems SQL Server (which I do not use*) can do MVC, but not by default.

I work with MySQL/PostgreSQL/Firebird.

August 2008

Justin

Been there, done that. Anyone using SQL Server in a large system will run into this. As a rule, when we select from a table we use with (nolock). Here’s another mostly unknown Gem you should know about:

[QUERY] with MAXDOP(4) – use a max of 4 processors

The command above tells SQLServer how many max processors it can use on a query. We ran into a query that would run MUCH MUCH faster on 1 processor than multiple. Our 8 core superserver was trying to be too smart with the query I guess.

Maximum Degree Of Parallelism (MAXDOP):
http://www.sqlmag.com/Articles/ArticleID/97044/97044.html?Ad=1

August 2008

osp70

I agree with Jasmine. To add to it, because you don’t show the comments in coversation style but rather order of ‘votes’ the appearance of out of date will be minimal as well.

August 2008

Jacob

It should always be okay to be viewing out of date info on the web shouldn’t it? That’s the nature of a web page - it’s as stale as the last refresh. This web page may have had new comments added since I started viewing it two minutes ago… what’s the point in using fancy locking in that case.

August 2008

Steve

Wasn’t the diner a drive system for some ship in one of the Hitchhiker’s Guide to the Galaxy books?

Yup. It was from Douglas Adam’s Life, The Universe, and Everything:

Bistromathics:

The most powerful computational force known to parascience. Bistromathics is a way of understanding the behavior of numbers. Just as Einstein observed that space was not an absolute, but depended on the observer’s movement in time, so it was realized that numbers are not absolute, but depend on the observer’s movement in restaurants.

The first nonabsolute number is the number of people for whom the table is reserved. This will vary during the course of the first three telephone calls to the restaurant, and then bear no apparent relation to the number of people who actually turn up, or to the number of people who subsequently join them after the show/match/party/gig, or to the number of people who leave when they see who else has turned up.

The second nonabsolute number is the given time of arrival, which is now known to be one of those most bizarre of mathematical concepts, a recipriversexclusion, a number whose existence can only be defined as being anything other than itself. In other words, the given time or arrival is the one moment of time at which it is impossible that any member of the party will arrive. Recipriversexclusions now play a vital part in many branches of math, including statistics and accountancy and also form the basic equations used to engineer the Somebody Else’s Problem field.

The third and most mysterious piece of nonabsoluteness of all lies in the relationship between the number of items on the check, the cost of each item, the number of people at the table and what they are each prepared to pay for. (The number of people who have actually brought any money is only a subphenomenon in this field.)

Numbers written on restaurant checks within the confines of restaurants do not follow the same mathematical laws as numbers written on any other pieces of paper in any other parts of the Universe…

August 2008

J_Liles

I see JC already suggested this, but I have worked on two financial apps using SQL Server and both have had issues with deadlocks (and yes, we tried all the usual deadlock analysis. We arrived at two approaches:

NOLOCK hint. As Jeff mentions, this is rarely as bad as people make it out to be. Gee, so the data is 500 ms out of date. Tough. There are only a few cases where this actually matters.
Put in your own retry logic (detect the deadlock error and resubmit the query). This may feel like a hack, but guess what, it works. And would you rather have this hack or have your users see stupid database errors. If you have centralized database access methods, it is fairly simple to add retry logic.

You can’t fix SQL Server, so you just have to deal with the real world here and do whatever it takes to make sure your users don’t experience database errors.

August 2008

Greg

Jeff, before you post on things like SQL Server locking you should do your homework beyond Google and blogs. Books still have a place, you know.

http://www.insidesqlserver.com/

My guess, without having access to your logs and such, is that your deadlocked transaction (the one with the SELECT) is probably part of a longer outer transaction that acquired a shared lock on the rows that are being UPDATEed by the other one.

Setting your database transactions to snapshot isolation mode can still result in problems, since it will raise an error if you try to update rows that have changed since the beginning of your transaction (version mismatch) and also takes out X locks on writes, so you can still get deadlocks.

The correct way to fix this problem beyond using snapshot isolation mode (which will help greatly, because it removes the use of shared locks) is to a) minimize the length of your transactions to minimize the number of locks and b) always set up your web server to re-run requests in the case of a update conflict or deadlock.

August 2008

Matt_Davis

We’ve built a large service (Earth Class Mail) pretty much exclusively on LINQ to SQL. It’s worked great, except that we’re still battling all the transaction issues almost a year later (deadlocks, leaky transactions, etc). Make sure you FULLY understand the intersection between TransactionScope (if you’re using it) and SqlTransaction. We didn’t, and we’re still paying for it (with daily deadlocks just like what you describe). Make especially sure you’re aware how Read Committed Snapshot (and any other tx isolation level) can leak into your connection pool, otherwise, you’ll end up running stuff at the wrong isolation level (often causing more deadlocks and how the he*l did that happen- we’re not even in a transaction! moments. We ended up building a bunch of infrastructure into our own DataContext-extended base class to deal with it, and to ensure that the connection pool stays free of connections stuck in transactions.

Good luck!

August 2008

Double_K

OK, but why doesn’t SQL Server re-issue the transaction on my behalf, then? Automatic retries of some kind?

Think of the banking application… you wouldn’t want to re-issue anything if you know something slipped… it IS better that SQL server team be pessimistic by default.

August 2008

JonathanH

I believe what we are seeing here is the result of using ORM magic and now the resulting pain of trying to debug the SQL the ORM is doing.

I’ve never encountered a deadlock situation, but I use my own handwritten sprocs, so I know exactly what the DB is doing at all times.

August 2008

Arkh

Jeff I get that, but I don’t understand why simple read SELECTS need to lock at all. I mean, maybe if that database was storing my bank account balance, in which case transactional orders are hugely important.

Because the selected data could be usefull later during the transaction. It’s why you don’t use transaction if you don’t care about your selected data.

August 2008

KevinF

I’m sure you could set all of your SELECT to read uncommitted by default, but – really – who cares?

If you have something mission-critical where you can’t afford a dirty-read, isolation is good. For the rest of us, NOLOCKS are wonderful as long as you know when to apply them.

Playing it safe by default is fine. Forces you to actually think about what you’re doing.

–Kevin

August 2008

will10

The reason most people recommend accepting dirty read (with nolock) is because you get better performance, have not seen if this has changed with ms-sql 2008. However don’t use it when the results really matter for instance if you are transferring money between two rows and you do a dirty read in the middle you could get the money counted twice once in the original row and also in the row it is being transferred into.

As for Oracle by default it only reads committed information at the time the statement started running

August 2008

Syd

Re. Row level locks on UPDATE statements. I can demonstrate a scenario where this consistently and reliably doesn’t work - it is a very small table, and the WHOLE table ALWAYS gets locked. Microsoft’s semi-official response was (paraphrased) row locking on update is a hint to the server, not an instruction.

August 2008

Steve

I switched to using (nolock) years ago – I would say that it is the right choice at least 90% of the time.

August 2008

BobW

One should build one’s application correct first, then optimize. The database’s default behavior is paranoid locking for banking correctness ; this is the sane default. In a web messaging app, slightly stale data is perfectly acceptable, but for banking stale data is a fatal error.

August 2008

Greg

Just thinking of DB deadlock (Oracle background), the only time I’ve seen them is when there are multiple statements in a transaction.

Like…
begin tx
insert
update
commit

I’ve not worked sufficiently with SQL Server or AutoCommit to know if this could happen.

Just something to think about.

August 2008

TaylorG

I’ve got a couple quick thoughts flying around in my head…

OK, but why doesn’t SQL Server re-issue the transaction on my behalf, then?
I can think of many ways to argue this point, but really, it comes down to practice… If you know the problem and solution, why not just implement the code to test for that condition and re-submit the batch when it occurs? It’s not rocket science… in fact, it’s good practice for any data access layer.

Instead of relying on a whole database change or dealing with (nolock) on lots of tables, have you considered using SET TRANSACTION ISOLATION LEVEL on the problematic queries? For example, SET TRANSACTION ISOLATION LEVEL SNAPSHOT: Except when a database is being recovered, SNAPSHOT transactions do not request locks when reading data. SNAPSHOT transactions reading data do not block other transactions from writing data. Transactions writing data do not block SNAPSHOT transactions from reading data.

Maybe not applicable in your specific case, but just for completeness, I want to mention using SET DEADLOCK_PRIORITY in some instances… maybe where the read is killing the write.

There’s a lot of things you can do to troubleshoot the actual cause of the deadlock, but I really hate to see someone configure a whole database setting just because they don’t understand the problem, but rather want to assuage the symptoms.

August 2008

MaxK

What is surprising is that SQL Server thinks there is a deadlock. It doesn’t wait for the writer to finish and then just do the read? You’re not even referencing other tables in your UPDATE or doing an INSERT SELECT.

Anyhow, the ANSI standard way to do your nolock thing is is to set the transaction isolation level, either globally or at the time that the transaction starts. (It sounds like SQL Server defaults to SERIALIZABLE for its transactions? That’s a very strange default.)

-Max

August 2008

DennisG

Jeff,

I would recommend not to use LINQ if you really want to make your site work.
LINQ is raw and not very reliable technology.

Unfortunately not everything that ScottGu promotes is good enough for production use.

ASP.NET 2.0 was really good. LINQ, Silverlight, and Atlas – not so much.

August 2008

Paul_N

Jeff, tell us about any foreign keys you have connected to the page table. A lot of times a deadlock can be caused by SQL server attempting to reconcile an FK.

Nolock is the ON ERROR RESUME NEXT of SQL. It’s not a good practice to get into unless you understand fully why you’re doing it.

Are your primary keys Autonums? If so, take a look at how SQL Server treats locking at a page level instead of a row level. If you have a page that’s id #10 and id #11 is being heavily updated, it’s possible that the updates to id #11 is locking the entire data page. Maybe you should change field names or something and post the entire SQL here. I get the feeling something is being abstracted out. You should only be seeing a deadlock situation in the real world maybe one out of a million transactions. Otherwise, there’s a code problem in there somewhere.

August 2008

lix1

http://msdn.microsoft.com/en-us/library/ms173763.aspx

August 2008

Mark

It ain’t rocket surgery

Nice pericombobulation there

August 2008

Robert_G

I totally agree with Justin on the parallelism business. After turning it off, the performance more than doubled in my system.

In addition, parallelism can also lead to a phenomena known as CPU starvation. It works like this, let’s say you have different 20 queries in the queue and an 8 CPU box. Typically many of them would execute simultaneously since they all hit different tables. However, with parallelism, SQL Server breaks up the first query into 7 pieces and sends them off to 7 different CPUs. Now most CPUs are busy and the ones that completed processing are waiting for the results from other CPUs to trickle in.

Meanwhile, the queries in the queue, are sitting around waiting. Then, if the queue wait timeout is hit, SQL Server dumps the queries into tempdb, to be retrieved milliseconds later, when CPUs are available. So all of a sudden, your system is going to hell because of all the writes to tempdb.

As far as the deadlocks are concerned, I’d throw several traceflags on the system - they will tell you exactly why (and what keys, indexes, etc… are involved) your deadlocks are happening.

Try this for starters,

http://support.microsoft.com/kb/832524

then if you need some more help, contact me via the blog.

Regards
Robert

August 2008

pete

Weird problem I have never experienced it especially as you are saying a few times a day. I might have missed it if it was something once a month but not a few times a day.

Are you as I have noticed in your previous posts using lots of dynamic sql as oppose to stored procedures?

Even for a quick select or update statement the query plan compilation can take time on a busy server, perhaps this is causing record locks to last longer or even lock more than necessary.

August 2008

Keng

Oh, you’re using SQL-Server 2005…they fixed that in 2008…

August 2008

JimM

The only reason you should be getting a deadlock is if you are waiting on someone who is in turn waiting on you. You have to ask yourself two questions: why, exactly, are they waiting on you? and why, exactly, are you waiting on them?

If your read transaction is waiting on some other transaction’s write and you deadlock, that means they are waiting on you. Usually because of your own write lock. Do you really need to hang on to that lock (aka, why not just commit what you have)?

August 2008

TiTi

OMG! i was shocked when reading this : How can a read be blocked by a write? What possible contention could there be from merely reading the data?

like in one comment : It’s always a little disturbing to see a well-known coder ask a dumb question
and another : I think you misunderstand what a deadlock really is. A deadlock is not a lock that is taking a long time. It is a lock that will never be released until one of the two processes is killed.
i totally agree…

i’ve streched my head enough to know that threads / synchronisation = pain in the ass
…but that is basic stuff, you deceive me Jeff

if you need to read stuff coming from a probe, maybe you doesn’t care if the data has been written into your buffer when you read it to directly display it…[but it’s bad to do that way]
Usually : you don’t want to read unvalid stuff.
The reader needs to wait before the written has done this job, else you are going to read old/invalid data… !!

never heard of NOLOCK option before, seems another ??!!? stupid stuff from M. $ql $erver

August 2008

NickB

You really need to make sure you understand what read committed snapshot does and if you should apply it to your application. I have put together this quick informational post on read committed snapshot to help non-DBA types to understand the process.

http://www.coderjournal.com/2008/08/deadlocked-read-committed-snapshot-explained/

It can cause breaking changes in your application if you apply this with out knowing what the effects first. Your application might rely on row level locking for specific business rules. So be careful.

August 2008

Steve

The last place I worked had a strict nolock on every query policy.