Primary Keys: IDs versus GUIDs

Long-time readers of this blog know that I have an inordinate fondness for GUIDs. Each globally unique ID is like a beautiful snowflake: every one a unique item waiting to be born.


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2007/03/primary-keys-ids-versus-guids.html

I agree for the most part. Another advantage of integers however is that they produce friendlier URLs. For web apps, I favor the HiLo table key pattern.

Doesn’t Pros point 4 conflict with Cons point 3?

One downside of integer ids on websites is that they are predictable, so anyone who wants to scrape data from your site can do so more easily.

(is it just me or is the word always “orange”)
I like using guids in a section of an unsecured part of a website. I am working on a contract management system that has multiple companies logging into one system, and to keep certain things from being able to just guess (like system messages to users) i use a guid. But for everything that is admin level, i just us Identities.

“(is it just me or is the word always “orange”)”

It is, he’s using a “retarded” capchga as he calls it. There was a post a couple monthes back explaining why.

Cullen

@Backwarlink: Not only does Cons point 3 conflict with Pros point 4, but it also conflicts with Pros points 1 and 2.

On the bright side, though, Cons point 3 is a con of existing database software, not a con of GUIDs per se. If your hashtable performs better on sequential keys, then you’re not using the right hash function. (And if you’re not using a hashtable, why are you writing database software?)

Doesn’t Pros point 4 conflict with Cons point 3?

Partially, yes. Using completely random GUIDs can lead to insertion problems (and precludes physical clustering of the index).

Here’s a great little comparison between standard GUIDs and sequential GUIDs in SQL Server 2005:

http://www.sqljunkies.com/Article/4067A1B1-C31C-4EAF-86C3-80513451FC03.scuk

Yes, Jeff uses orange: http://www.codinghorror.com/blog/archives/000712.html

More to the point of the original post (grin /), I got to say that, for some odd reason, my instinct is to go with GUIDs. I think the drawbacks are marginal.

1~ size: disk space is not free, but it’s cheaper by the day
2~ readability: I think GUIDs are unreadable when presented ‘raw’; I always consider developing tools to help administrators and developers in their daily tasks, which would resolve the GUIDs to something more readable for them;
3~ generating randomness: for the time being, I have never hit a case where the cost of generating a random GUID was the performance bottleneck; same applies to the index clustering aspect, although I’m sure that if I was developing database systems I’d be much more concerned with this one.

The only problem I can see with GUIDs is that they are not naturally ordered (so I have to create a mechanism to keep track of order, where needed), and that there’s always that sinking suspicion that one day you will get a duplicate GUID (since, I believe, most GUID generating libraries I’ve seen don’t guarantee 100.00% uniqueness, but something like 99.9999…%). Call it Murphy’s Curse :stuck_out_tongue_winking_eye:
F.O.R.

I don’t like using GUIDs as primary keys because when you deal with high volumes, you pay a big price for the extra bytes. You pay it in the join operations, in the index seeks, and in the number of records in the cache.
I found that a good way is to combine the two approaches. That is - use an integer as the primary key and add a separate GUID as a replication id. Do all joins on the primary key and use the GUID only for replication. This ensures high performance reading and writing the table, but also allows to uniquely identify each record. The downside is that the same record on different servers may have a different primary key, but in my opinion this is a minor issue.

I think it’s possible to have a sequentially generated GUID on the middle tier (eg, without contacting the database server to generate the GUID). You could do this by factoring date and time into the GUID calculation. Each new GUID generated is guaranteed to be “higher” in sequence than ones generated earlier in time.

Here’s one such approach for SQL Server 2000:

http://codebetter.com/blogs/scott.bellware/archive/2006/12/27/156671.aspx

Of course, this assumes the date and times are in sync on said machines…

Sequentially assigned GUID values. Seems pointless.

I don’t see any actual benefit to this at all. Is it all about geek factor?

I do hope no-one is suggesting that one build web site URLs off GUIDs.

http://www.codinghorror.com/blog/archives/C87FC84A-EE47-47EE-842C-29E969AC5131.html

is a bit manky, and not at all user friendly.

I love GUIDs.

We actually just started using them in one of our applications at work and they’ve been pretty neat to work with.

BTW, been reading your blog for a while, first time commenting. Just gotta say though, keep it up!

I’ve been using GUIDs for a long time. The real problem is that they don’t index well in large databases. Basically when you use integers the newest row (at the bottom of the table) is the biggest number. That alone saves a lot on index fragmentation. However, I do like being about to create a primary key without talking to the database. I’ve been using CombID for a while. Basically it is a guid, but it is not completely random, part of the guid contains a date/time stamp. With the date/time stuck in there (in the right place), the guids now sort correctly in the DB! The newest row is the highest guid number, yea!

http://www.informit.com/articles/printerfriendly.asp?p=25862rl=1

IMHO, GUIDs should not be “in” a URL but definitely are better than raw ids in a query string… such as /Products.aspx?Id=C87FC84A-EE47-47EE-842C-29E969AC5131 is better than /Products.aspx?Id=4. Why? Well continuing with the “product” idea, users could figure out the URLs for products created around the item they are viewing. (Id=3, Id=5, Id=6) What if it’s been deleted? What if it’s been deactivated? Hopefully you’ve handled this but if you haven’t it sucks to have someone purchase 300 of a product discontinued five months ago. It also exposes things like how many products you have, how often you have them, and worst comes to worst, someone could write a script to “crawl” your whole database, suck it down, leading to your whole database (including parts you don’t want others to see) replicated or perhaps a DDoS, etc… It all depends on your own circumstances but for some using auto_increment as a primary key and then showing that to the world can be very bad.

One of the reasons I sort of hate blog entries like this are that it tends to let the less-experienced/educator keep believing that most/every table in a database needs a integer OR a GUID-based key… The third, and better (as even Celko will tell you, as much as I hate to give him as an example – he’s really a troll who doesn’t even use ANSI 92/99-standard join syntax) is to pick a true candidate key, and not a synthesized one. Jeff, I’ve even seen people create a “Year” table with, you guessed it, a YearID (1,2,3…) next to a “YearName” column. It made me cry. The only thing worse would be to see a GUID next to “YearName”. Maybe I just need to take a vacation.

Don’t get me wrong here. I’m not poo-pooing GUIDs, but I do think they’re overkill for a lot of (especially smaller) applications. BTW: They’re awesome in handheld applications where you’re totally disconnected from your server.

Clustered SQL Server indices are used to physically order the pages.

Also, any non-clustered indices imbed the clustered index value, so “fatter” primary keys propogate the fatness in the non-clustered indices.

Thumbs down on GUIDs as PKs.

I rarely use GUID’s, let alone as PK’s. One instance recently that I did use them was for linking to AD User Objects within a SQL database, I wanted to replicate a users name and email from AD into a sql database so I could quickly lookup up common information in an app that I just created.

Their name and e-mail would be looked up very frequently, so instead of burdening AD with this task, I put the information in SQL using the ObjectGuid attribute from AD and making that my FK in SQL.

This is one of the few times that I have actually used a GUID for a valid reason other than “the geek factor”.

Just looking at Microsoft’s Membership/Roles Provider creating guid’s for everything (AppId, RoleId, UserId), sent shivers up my spine. Very difficult to work with and not needed in that instance.

OTOH, there is very valid reasons for using GUID’s, I just don’t think as often as people think.

@David Barrett - “I do hope no-one is suggesting that one build web site URLs off GUIDs.”

There is at least one fairly-large software company out there that already has:
http://www.microsoft.com/downloads/details.aspx?FamilyID=B533619A-0008-4DD6-9ED1-47D482683C78displaylang=en

How does that “proof by counter-example” thing go again?