The Incredible LinkTron 5000(tm)!

I talked in a previous post about Unbreakable Links-- that is, stating every URL in terms of a Google search rather than an absolute address. Great concept, but how do you determine which words on a web page are most likely to generate a unique search result? Well, wonder no more:


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2004/08/the-incredible-linktron-5000tm.html
  • LinkTron wasn’t prefixing http:// in front of URLs entered into the textbox. Now it is.
  • temporarily disable gzip page retrieval (bug)
  • fixed and re-enabled GZIP support
  • added two additional options
  • changed default to dictionary only (faster)
  • warn non-english readers
  • better caching
  • gracefully handle HTTP errors (timeout, dns resolution, etc)

I was waiting for this sooo bad…but now I am kind of disappointed, because it does not seem to work right. I tried two simple random URL’s that came to my mind and it didn’t link to the right page.

The one’s I tried:
msdn.microsoft.com/express
http://www.franklins.net/dotnetrocks

Well, you linked to the front page of a dynamic website ( dotnetrocks ), which is going to change. For example if you linked to the front page of this blog. Two weeks from now the content would be totally different! Not a good idea.

Try generating keywords from PERMALINKS (eg individual articles) rather than front pages… I think you will be very happy with the result.

Also the 2nd one works fine ( http://msdn.microsoft.com/express )

I get…

http://www.google.com/search?q=hobbyists+novices+complements+lightweight+enthusiastsbtnI=1

Which brings me directly to that page!

  • faster rejection of words less than 5 chars
  • fixed small alternate dictionary bug
  • updated links to current blog entries
  • reject pages with less than 20 words of plaintext
  • found a much more sophisticated HTML regex replacement ( http://concepts.waetech.com/unclosed_tags/ ) that can deal with HTML tags that include “” as an element.
  • force the use of non-dictionary words if the # of unique words on page is less than 60.
  • now stores plaintext as continous space delimited string (for markov chain generation, eventually)

Hi Jeff,
you are right. It works fine, if you don’t use it for sites that change their content a lot.

I think it would be really cool to have a LinkTron web service, so everybody can start using unbreakable links on their web sites by parsing dynamic content for anchors and replacing them by unbreakable achnors!!!

  • implemented phrase counting*
  • show processing time in milliseconds
  • allow url= querystring param
  • I am unclear how to use the data generated from the phrase frequency count… suggestions? This is much, much more complicated than a word frequency count.
  • deny non-English domain suffixes *
  • fix Int32 overflow due to Google doubling index size
  • fix Deflate bug
  • incorporate latest shared libraries
  • flush alternate (user website generated) dictionary
  • sorry, results will always suck due to use of English dictionary… and there was too much abuse.

http://www.google.com/search?q=gotchas+unbreakable+irreversible+kidney+generatesbtnI=1
this is what i got for http://www.codinghorror.com/linktron5k/Default.aspx
The Linkitron kind of funny
"gotchas unbreakable irreversible kidney generates"
is this the newest superhero?
Gotcha, The Irreversible, Unbreakable, Kidney generator!

Ironic: 404 error on the linktron5k link