URL Shortening: Hashes In Practice

There’s also the swedish shortening service called x, www.x.se, which produces as short url’s as http://x.se/ab.

And if you think that short URLs are far too easy… why not make an ugly one: http://maul.ubermutant.net/

@Jeff,

Me:

“Yes you are” (Wrong)

You: (In a reply to another poster)

“We still need the Table part of the HashTable”

My apologies, I completely misunderstood you. It was the part in your post where you mentioned the problem with getting url-s back if the service was lost that threw me into thinking you were going for a table-less solution. Now it is obvious you didn’t need any pointing out that hashes are one-way.

However, as those who unuderstood what you meant have pointed out, the scheme of using a hash to produce primary keys has (apart from the collision issues) no real advantages over using just an autoincreaser.

/Mats

Tiny url used to sequentially increase the url they returned until people started exploiting this fact.

This is far too complicated a solution. Here’s a much simpler, compact, and fast method to get there:

  1. Don’t use the URL as a shortening target; hashes are good on a one-way basis (text to hash) but not the best way to access a database for the other way around. Instead, use an integer index to the row in a database (or array, dictionary, or whatever you please) as the value to express.

  2. The next step is simply to express that number in base X; if you only accept characters A to Z, then express this number as a base 26 number using letters instead of numbers; lowercase, uppercase and numbers ? Base 62.

This way, you always benefit from the full density of the expressive range, without collision testing, hash searches, or sparse use of this valuable resource.

Guys take a look at

http://www.vividurl.com

I created for you all
I am the developer of the site.

Just checked it to believe it. None better than this.

http://vividurl.com/DashDefyTab
hmm, this does seem easier to remember

or not :slight_smile:

Qurl.net uses (or rather, used) base62 and an auto_incrementing integer; arbitrary alphabet/base encoding is all of about 30 lines of code. Records have a SHA1 hash of the URL used for dupe checking (since the URL itself can be up to 64k long; you can encode a fair bit in a data: URL, and several people have).

Ultimately I turned qurl.net off; there aren’t many users, and indeed assholes mostly used it to support their spam operations. No thanks.

Maybe I’ll bring it back with a “report this link as spam” button to discourage them. Somehow I’m not convinced spammers would notice…

ueee
my best shortener is :

http://smarturl.eu

Personally, I use base 65. A-Z, a-z, 0-9, _, -, and . on http://www.inethdd.com/shortenURL.php

I put all of those characters into an array, then shuffle it, select a random number of them between 1 and 12 and create the hash if it doesn’t already exist. Otherwise it runs again.

That gives me 12^65 possibilities. More than I would ever need and a lot simpler than some suggestions here.

Don’t forget about:

http://xhref.com

(in the url shortening world)

http://sn.vc only 2 character and is one of the shortest domain and allow us to preview. How useful would you think?

One URL shortening service I like is http://urltea.com

They support appending the short url with a descriptor.

For example:

http://urltea.com/1agx?404-pages

If I were to implement one of these things I wouldn’t even bother with a “hash”. I’d just have a single database table with an integer auto increment primary key and the corresponding url. When you tinyify a URL, you just scan the table for that url, grab the ID. If it isn’t there, insert it.

Then just take the integer ID and convert it into “base 38” with 0-9, a-z, _, and +, maybe more. Incoming URLs are converted back to the int, looked up and forwarded to the corresponding URL.

Of course, with a true hash, you could convert to your int and scan the table on the primary key rather than a character field, which arguably would be a bit faster. But you generally would sacrifice your URL length to do that.

I’m with you Jeff…my first thought would be a hash. After all, who wants to lug around a DB solution for something as trivial as string translation? Using a DB seems like one of those “only tool is a hammer” type inclinations.

No mention of http://rubyurl.com either?

Dude, RubyURL is totally the best designed URLshortner on the web!

(I know, because I designed it)

Status: 500 Internal Server Error Content-Type: text/html
We’re sorry, but something went wrong.

We’ve been notified about this issue and we’ll take a look at it shortly.

New to Coding Horror. While the algorithms behind URL shortening services might have been mistaken by Jeff at the start, it’s pretty nice to see so many ideas have come to not only figure out how these sites work but how to make them better. That AOL-style naming convention would prove very usable. Granted, I feel dirty even commenting on AOL =P

In all seriousness, Mr. Magoo’s point of a non-zero likelyhood of a collision occuring is valid.

I’m confused about how someone can consider using a hash to “encode” a URL and then worry about collisions without realizing that collisions can only come about because a hash is non-reversible. Lossy compression implies a loss of information, which in the case of URLs seems kinda bad.

I’m also more than a little bit surprised that folks didn’t just know that it has typically been an incrementing counter, but maybe I just started seeing tinyurl links early enough in its life that it was obvious. But geez, I’m not even a web guy.