URL Rewriting to Prevent Duplicate URLs

ISAPI_Rewrite has a free version, but the rewrite rules will get applied to every site on the server. It’s perfect for the development environment.

I have had great luck with ISAPI_Rewrite and would recommend it to everyone. It’s well worth the $90, and their support is pretty good.

I just wish I could figure out a way to make ISAPI_Rewrite work with the VS.NET built in web browser.

I’m a huge fan of ISAPI Rewrite. The bundled regular expressions tester is a fairly useful little tool too. I’ve used it to debug numerous regexps for varying languages.

I only wish I had a good way to automagically generate expressions for building applications in Visual Studio 2005. I believe in the W3C’s philosophy of a href="http://www.w3.org/Provider/Style/URI"Cool URIs Don’t Change/a and would prefer it if my .NET apps were future proofed for the next big extension change (eg from .asp to .aspx)

I agree, the ISAPI Rewrite tester is handy.

I grok regular expressions fairly well, but what trips me up with URL rewriting is I often don’t know exactly what’s coming into the regex-- is it the complete URL? Or just the path? There are definitely some oddities.

For example, this is directly from the ISAPI_Rewrite forum moderators:

RewriteCond Host: (.*)
RewriteRule ([^.?]+[^.?/]) http://$1$2/ [RP]

Where the heck is $1 coming from? The Host: match?

And in this one, also lifted from the forums for the Ionic filter:

RewriteCond %{HTTP_REFERER} ^(?!HTTP_REFERER)

What I’m trying to do is say “allow blank referers”, eg, “only apply this rule if the referer is not blank”. I have NO idea how I would have figured this out without seeing someone do it first. The traditional way to say “allow blank referers” with mod_rewrite is this:

RewriteCond %{HTTP_REFERER} !^$

Also… the spelling of referer has always bothered me. Shouldn’t it be “referrer”? WTF.

is unnecessary with ASP.NET 2.0, as it’s built-in

Really? So you can redirect http://test.com/folder/ ?

I don’t think so.

I am not a fan of the “native” ASP.NET 2.0 redirection because it’s fraught with limitations and holes. Unless you want all of your redirections to end in .aspx or some other file extension that the ASP.NET IIS handler understands, you have to redirect all file formats through the ASP.NET ISAPI filter. This is a nasty hack with lots of side-effects, and it still doesn’t address the issue of redirecting folders.

If you can only redirect URLs that end in “.aspx”, or if you have to remap all files to go through the ASP.NET ISAPI filter, then you don’t have a true URL redirection solution IMO.

Pretty URLs originally became important because some search engines would not index files with querystring variables or with more than a few querystring variables-- probably because you could wind up indexing an infinite number of URLs on one site (trust me, having had this happen on a local copy of Inktomi back in the bad old days before I bought a clue).

Subsequently, they’ve taken on a life of their own and everyone has their own rules about them (file extensions always, file extensions never). They appeal to geeks because of their cleanliness. The real justification, however, is that properly crafted URLs mean you can completely re-engineer your site and have everything still live in the same place (so my preference is for no file extension in that case; it’s also a free bit of security through obscurity).

You can use Google’s webmaster tools to let Google know to treat several of your domains as one (i.e. codinghorror.com and www.codinghorror.com are the same page). If you have never used Google webmaster tools I highly recommend it.

Very nice topic… Id like to add the mod_rewrite code for Apache that forces only the www.domain.com or vice-versa, and other .htaccess SEO tips at http://www.askapache.com/2006/htaccess/htaccesselite-ultimate-htaccess-article.html

Some pretty bad practice here: absolute paths!

RewriteRule ^/(.) http://www.test.com/$1 [RP]

RewriteRule .
.(?:gif|jpg|jpeg|png) /images/block.jpg [I,O]

Why it is bad? From the top of my head (30 seconds):

  • cannot change the domain name easily (also moving to a new country: www.codinghorror.fr, www.codinghorror.co.jp)
  • bad internationalization (all languages point to the same (English?) file /images/btnNew.gif)
  • cannot test it properly on a local file-system, cannot make a CD with a functional image of you site, etc. (ok, works for plain html, not asp, jsp, php, etc.)

ISAPI_Rewrite is unnecessary with ASP.NET 2.0, as it’s built-in. You can rewrite URLs in global.asax.

I find the Author is absolut correct that one should redirect to one site only. It’s a must and so simple to do.
But one should check which route is best to take when doing a redirect such as to www.mysite.com or just mysite.com.
There are tools out there, which will tell you whats best for you.
In my case I used the www route as much most of my pages were indexed
with www. And in a matter of days my non www pages all disapeared from the SE.

I can see you’ve got http://www.codinghorror.com and http://codinghorror.com and http://www.codinghorror.com/index.html and http://codinghorror.com/index.html (and few others) all resolving to http://www.codinghorror.com

Any reason why you would choose to use the www for the final URL or is it just personal preference?

Oddly… I put this in my .htaccess and it 500 errors the entire site!

The only part I used was:

force proper www. prefix on all requests

RewriteEngine on
RewriteCond %{HTTP_HOST} ^treehugger.com [I]
RewriteRule ^/(.*) http://www.treehugger.com/$1 [I,RP]

Nick, Make sure you’re using the right syntax. Only one of the lists I posted is (mostly) Apache mod_rewrite compatible – the top one.

Jeff, there is something which is interesting.

For a long time, the W3C pushed the idea of URLs like http://www.example.org/pics/cairo instead of http://www.example.org/pics/cairo.jpg, under the assumption that web clients and servers could use content negotiation to decide on the best format to deliver.

W3C has indeed pushed Content Negotiation (See Web Architecture) and it is working, it depends on what you want to achieve, but “content negotiation” is part of HTTP which is a spec published by the Web Community (yes, including W3C affiliated persons) at the IETF.

The idea of Content Negotiation is not owned by W3C. :slight_smile:

The only issue I’ve had with ISAPI Rewrite is when there are too many redirects in the httpd.ini file, which can slow down the request quite a bit.

Jeff: Great post and comments!

Glad to see URL Design finally getting some respect, especially from the ASP.NET crowd who have been the ‘have-nots’ when it’s come to clean URLs (vs. Apache web apps that have mod_rewrite built in.)

Anyway, for those interested in this sort of thing I discuss URL-related issues in depth over at The Well Designed URLs Initiative. And I am working to produce a comprehensive set of patterns and best practices but I need lots of input from web developers as I don’t know it all. That’s why I started the URLQuiz series where you get to test your URL knowledge and give your opinions of our URL design questions. Check it out at:

http://blog.welldesignedurls.org/urlquiz/

That’s funny, I was just trying out both and I was only able to get Iconic rewrite working, isapi rewrite only worked when I put the server into IIS 5 isolation mode.

I’m not sure if the Apache extension works the same way, but with IIS ISAPI Rewrite the instructions file must be at the root of the site so won’t work under a virtual directory. This only really is an issue if you develop under XP where you can only run a single web site.

Force www? Bleh. I STRIP www. An acronym that uses more syllables than the phrase it abbreviates. And who needs it? What’s wrong w/ http://google.com?

“What’s wrong w/ <a href=“http://google.com?””>http://google.com?"

old people freak out