URL Shortening: Hashes In Practice

Also, Base64 is commonly used (adding ‘/’ and ‘+’ to your 62 alphanumerics) to shorten text strings. You get a pagefile with 64 glyphs, which is nice because 64 is a power of 2. Note that a GUID encoded in Base64 is 22 characters long instead of the usual 32-28 characters.

With Base64, you need five (case-sensitive!) glyphs to key one billion elements.

It’s also interesting to note that only one of the services listed (qurl) is case-sensitive. The rest redirect no matter how you capitalize the letters. This is IMO a Good Thing, as casual users shouldn’t be expected to differentiate between capitalization in URIs.

Someone mentioned the possibility of “temporary” shortened links. This is the plan for the future of http://www.linkpot.net (also mentioned previously) - by default all links will be temporary and expire after a fixed limit.

A couple of years ago I was annoyed enough by the total lack of context in links generated by all the other shortening services, that I created my own http://lnk.nu that keeps some context (specifically, the domain name and file extension, if any). It’s not the shortest link you’ll ever see, but it sure is helpful to see where the link is likely to go:

http://lnk.nu/codinghorror.com/fl2.html

“My guess is the aggressive URL shortening services are doing a simple iteration across every permutation of the available characters they have as the URLs come in.”

Hey, why didn’t you just check this by obtaining two different URLs in short succession, before bothering us about it?

@Jeff

“Am I Wrong”

Yes, you are. A hash is a one-way function. That means that you can’t take the result and calculate the original value from it.

If I take the string “Hello world” and apply, say, the MD5 hash on it to produce the hash “f061a…” There is no function that will accept the string “f061a…” and spit out “Hello world”.

Collisions are not the issue. They /would/ be the issue if hashes were two-way. Since they are not, what you would have to do to figure out which url your hash value corresponded to would be to try hashing random urls until you found one that produced the correct hash value. Which is, of course, a completely ridiculous thing to try to do.

I appreciate your blog a lot but on this one I believe you are totally wrong. You are talking about compression, not hashing.

We’ve written a linking service, but we don’t use a database, or a hashtable. We simply use a mutable array. When we add a new URL to the array we base 36 the index and append that to the end of the shortened URL. When a request is received the end of the URL is converted back into a number which is the index of the URL in the array.

* <a href="http://qurl.net/1YU">http://qurl.net/1YU</a>
* <a href="http://rurl.org/808">http://rurl.org/808</a>
* <a href="http://jtty.com/cuy">http://jtty.com/cuy</a>
* <a href="http://elfurl.com/li4na">http://elfurl.com/li4na</a>
* <a href="http://shurl.org/pHbnD">http://shurl.org/pHbnD</a>
* <a href="http://shrinkster.com/s9y">http://shrinkster.com/s9y</a>
* <a href="http://tinyurl.com/yvvtag">http://tinyurl.com/yvvtag</a>
* <a href="http://clipurl.com/?PAP269">http://clipurl.com/?PAP269</a>
* <a href="http://shorl.com/dihyfradiduba">http://shorl.com/dihyfradiduba</a> 

Wow, no test of snipurl? Of all the various services, that is my favorite. It allows you to create your own key, and also has a JavaScript applet that you can use by placing it on your shortcut bar. Click on the shortcut button, and it automatically calls up snipurl, creates the shortcut of your current page, and copies it into your clipboard. A one step process. Snipurl gave this:

http://snipurl.com/1pvef

five characters, but we should also count the characters of the webpage too.

* <a href="http://qurl.net/1YU">http://qurl.net/1YU</a>  - 19 characters
* <a href="http://rurl.org/808">http://rurl.org/808</a>  - 19 characters
* <a href="http://jtty.com/cuy">http://jtty.com/cuy</a>  - 19 characters
* <a href="http://elfurl.com/li4na">http://elfurl.com/li4na</a> - 23 characters
* <a href="http://shurl.org/pHbnD">http://shurl.org/pHbnD</a> - 22 characters
* <a href="http://shrinkster.com/s9y">http://shrinkster.com/s9y</a> - 25 characters
* <a href="http://tinyurl.com/yvvtag">http://tinyurl.com/yvvtag</a> - 25 characters
* <a href="http://clipurl.com/?PAP269">http://clipurl.com/?PAP269</a> - 26 characters
* <a href="http://shorl.com/dihyfradiduba">http://shorl.com/dihyfradiduba</a> - 30 characters

* <a href="http://snurl.com/1pvef">http://snurl.com/1pvef</a> - 21 characters

Not too shabby. (Snipurl.com can be abbreviated as Snurl.com to save a couple of characters).

No mention of http://rubyurl.com either?

Dude, RubyURL is totally the best designed URLshortner on the web!

(I know, because I designed it)

Just a quickie: I take it you’ve seen the http://hugeurl.com/ ?

Wow, is everyone and their mom designing URL shortening services these days?

HugeURL.com is funny :
I once wrote a tinyurl-hugeurl-loop-script. You only see the urls redirecting in Safari.

http://rafb.net/p/cxAPGG68.html

To help stop email clients from chopping up links in email messages surround the links with link

Great post Jeff, very interesting and though it seems a lot of people are missing the point (esp. the ones claiming you’re “totally wrong” like Mats Helander above) I think you’re right on. It makes perfect sense to me.

Hashes don’t fix the “if the service goes under, the link is ueseless” issue. how about a simple compressing algorithm? Certainly creates bigger URLs than these services, but at least it’s reversible.

you can notice it if u use “urltea” shortening, if u send many urls in a close time span, you get similar 3 letter permutations.

If you could register the shortest domain possible (the shortest one I know is http://o2.ie/) you can have up to 6 letters after the domain and it would be the same length as the shortest URL you posted in your article (12 characters). If you could have A-Za-z0-9, that would give you 62^6 possible combinations, or 56,800,235,584 combinations.

My maths is probably wrong :stuck_out_tongue:

Seeing how much spam is on these url-shortening services, it would be interesting to see what would happen if one tapped into something like Akismet or used some other kind of spam-filtering technology.

I feel sorry for you Jeff, some people who read this blog are stoopid…

Doing some quick back of the envelope math, it appears that converting the md5 hash from 128bit hex representation to as base 36 alphanumeric (single case) representation would yield a consistent 4-5 character string.

As stated by Rich Skrenta in the post, md5 yields a possible 2^64 websites and more than enough for the foreseeable future.

When tinyurl first started out, you could enter a url of arbitrary size. This allowed you to basically use their service to store any type of data. You just had to write your own conversion methods.

Too bad they had to set a limit on the url size.