The Importance of Sitemaps

So I've been busy with this Stack Overflow thing over the last two weeks. By way of apology, I'll share a little statistic you might find interesting: the percentage of traffic from search engines at stackoverflow.com.


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2008/10/the-importance-of-sitemaps.html

seems like the time to develop a sitemap.xml (and the strategy for maintaining it as it grows) is a heck of a lot cheaper and more cost effective than creating the search engine. Since you were already wanting to just leverage Google for that feature the cost associated with sitemap.xml doesn’t seem all that bad.

So, rather than being …a little aggravated that we have to set up this special file… you should just going back to being happy that the problem of search was solved so easily for you.

Be an optimist.

Yay Jeff posted!

@Matt Cutts:
For instance, n-ary trees in c and every other variant I’ve tried thus far fails to return http://stackoverflow.com/questions/189855/n-ary-trees-in-c#189900 . I wouldn’t feel too bad about it though; stackoverflow’s own search can’t seem to find it either :wink:

Great work on your blog, by the way (you too, Jeff)!

Sorry, the link should be http://stackoverflow.com/questions/189855/n-ary-trees-in-c (I don’t expect Google to actually link to my post directly :wink:

Meta-comment: The fact that I don’t have to create an account is great, but the fact that you can’t edit posts makes for a lot of comment chaff (like this). Is there any way one could allow post editing based on e.g. possession of a cookie?

Hi Jeff,
What do you think about the relevance or non-relevance of using meta tags in webpages ?

Has it gone out of vogue or is it still useful?

I thought sitemaps would be one of the first things a webmaster would built when launching a website (whether dynamic or not).

I certainly never needed a sitemap on codinghorror.com.

‘There are also limits on size. The sitemaps.xml file cannot exceed 10 megabytes in size, with no more than 50,000 URLs per file. But you can have multiple sitemaps in a sitemap index file, too. If you have millions of URLs, you can see where this starts to get hairy fast.’

I can see how these constraints will led to some difficult to maintain hacks for the sitemap file for stackoverflow.There has got to be a simpler way for the Googlebot to work correctly.Hairy indeed.

SO is nice, but CodingHorror is still my crack of choice. Welcome back.

It doesn’t sound very scalable - that file must be a real hotspot for a site with the amount of activity that Stack Overflow gets.

Also, how are you determining the changefreq and priority for individual questions?

Scalability is (should be) a non-issue for sitemap.xml. The purpose of this file for a large site isn’t to list hundreds of thousands of unique URLs at once, but rather to allow spiders to discover these urls ONE TIME. Once the initial discovery has happened, Google should (for a high-traffic, widely-linked-to site) continue to spider those URLs, which in turn link to neighbor URLs, and so forth. In this way you can spider an entire site of many 10,000s of URLs via a few thousand URLs in the sitemap.

'I certainly never needed a sitemap on codinghorror.com.'
Jeff I think in the case of Coding Horror there were plenty of trackbacks and other blogs that linked to your posts making it easier for the somewhat dimwitted Googlebot to find your posts.

Good post, interesting!

On a side ntoe if I click the www.codingwheel.com author name in the comment above in firefox 3 I get a content encoding error page cannot be displayed. Just a heads up =p

Welcome back!

You don’t need the sitemap – you can wait till Googlebot gets around to index your site. Apparently, that wasn’t good enough for you. Don’t blame your impatience on the poor bot :wink:

You may be drawing causality from coincidence on the sitemap.

The Google algorithm usually displays new sites high in the rankings immediately. Then, sandboxes them for a few days/weeks, until they gain PageRank. Finally, they pop back to an accurate position.

During that sandboxed period, it’s normal to search for unique terms and find other (not sandboxed) sites, yet not your own.

I’ve seen that pattern play out with every new site I launch, independent of SEO efforts (including sitemaps).

That’s very interesting. I’d heard vaguely of the idea sitemaps but had no idea it could make such a huge difference.

I do find it vaguely disturbing that my first instinct after reading this was to find and click the Upvote button.

At last. Something I already knew that Jeff didn’t!

hi jeff,

wouldn’t links like (naive example, i know)

http://stackoverflow.com/questions/page/2/sort/hot

instead of

http://stackoverflow.com/questions?page=2sort=hot

also convince google to follow all you links

i think google is not happy with the dynamic parts of the url e.g. ? or …