I first saw Yahoo's 13 Simple Rules for Speeding Up Your Web Site referenced in a post on Rich Skrenta's blog in May. It looks like there were originally 14 rules; one must have fallen off the list somewhere along the way.
This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2007/08/yslow-yahoos-problems-are-not-your-problems.html
If you don’t take rule #1 seriously “pinging the server for a new version and getting a 304 not modified header back in response” can become quite costly, even if you are not yahoo.
One roundtrip can be more than 1KB in traffic, 5 such request and you will have 5KB of lost traffic. If content and markup of your page takes 10-20KB it’s quiite substantial part of the traffic.
If you’re experimenting with Cache-Control headers (set in IIS via the file/folder properties dialog, HTTP Headers tab, “Enable Content Expiration” checkbox) there’s a great summary here:
Also, remember that F5 forces the browser to re-check all files, even those that would normally be cached. Not that I made that mistake or anything…
“the browser won’t even ask for a new version”
IE 5 on the Mac used to always go to the cache regardless of the expires date. We had to pass a random number in the querystring to force it to GET the page from the server. Sucked. Thank God that browser tanked.
re: F5 - F5 does force the browser to re-check the files, but in some cases changes to static and some dynamic pages may not show up. Ctrl-f5 (cmd-r on a Mac) forces the browser to by-pass the cache entirely.
This is a fantastic critique of the methodology used by Yahoo. Thanks.
The reason I advocate people opt in to setting cache control headers and the like for “real” projects (I must take the time to fix up jcooney.net too) is not the network traffic - it’s the latency all those extra requests add to your page load.
Great post! You’ve done a wonderful job on this site - it renders very speedily on an iPhone over EDGE, as well.
Good stuff, I’ve use YSlow to optimize my blog and to compare with other blog including yours. I come to realize that this is pretty apples and oranges here. Come to think of it that in order to really see the close result, then two comparing blogs have to have nearly same contents and same amount of posts on one page. So yes, it is nice to know how optimized a blog really is, but to use it to compare blogs is just wrong IMHO! By the way, I totally agree with your analysis that YSlow is targeting high traffic sites that can be optimized with all of their assets at disposal. Anyway, I will post my analysis about this YSlow too later on my blog as well, not trying to compare and use others to optimize my blog, but to show how it can optimize the blog based on its guideslines which i doubt that some of them I can even tweak for blogger platform…lol! Anyway, good post Jeff and your blog really has blown my sox off fast now!
ETags aren’t necessarily checksums. You can put any kind of value in them, like an ID, a cryptographic hash (eg: MD5 or SHA-1), an actual datetime like the modified timestamp in your database table, the file size or a compound value of the above or anything else.
What is important when using ETag is that you will generate the same ETag value for unmodified resources and a different ETag if they’ve been modified.
As for the Expires, I can’t confirm at the moment but I believe that the “must-revalidate” property of the Cache-Control will force the browser to check with the server despite the presence of the Expires header.
I prefer to use the max-age property of the Cache-Control rather than Expires. It is simpler to just say “max-age=86400” then “Expires: Fri, 18 Aug 2007 02:55:13 GMT”
I personally use the expires and cache-control settings, and even with on the order of 10K visitors a day I think it makes a pretty big difference. We do it because our price comparison pages are pretty heavy on the server, and we only set it for about an hour.
Other than that I totally agree-- CDN for most people means offload your images.
Thanks for the ETags explanation I hadn’t seen much on that previously.
Far-future expires headers are great for the fact that once the client has the resource (img, js, or css) they don’t need to check again. Almost every build process at Yahoo! uses date-stamped files for these types of resources. That way when a new build is pushed out to production the filename itself changes. This way instead of having the browser checking for new versions of these files it can just passively go get them when it’s told to. Removing the need for those roundtrips to the server does make a very measurable difference for sites that get a lot of use.
I prefer to use the max-age property of the Cache-Control rather than Expires.
Expires is a HTTP 1.0 header; Cache-Control is a HTTP 1.1 header. Most modern web servers use the Cache-Control header, which does the same thing Expires did, and much more…
what we need is a bittorrent’esque webbrowser.
There may be special cases where CDNs are worthwhile even for small sites. If you’re using YUI, for example, the only good reason not to use the Yahoo-hosted versions is if high security is required. It saves you bandwidth, and it saves your visitors time because it dramatically increases the percentage of cache hits.
“All you’re really saving here is the cost of the client pinging the server for a new version and getting a 304 not modified header back in the common case that the resource hasn’t changed. That’s not much overhead… unless you’re Yahoo.”
That’s not exactly true. Part of the savings here is realized not by the person hosting the content but the person requesting it. If you’re an Indian visitor to a US web site, even a conditional GET is quite expensive. You’re making the request over a high latency, often flaky connection. We’ve seen this in Yahoo! Mail and it has forced us to squeeze requests and responses into as few packets as possible. The damage caused to user perceived performance by a single dropped packet on these networks, where round-trip latency is high, is quite noticeable to the end user, less so to Yahoo!.
Jeff, as some already did I strongly disagree with your Expires header part for a simple reason: latency.
And the fact that most browsers only handle 2 parallel connections by default. This means that everytime a browser has to check the cache headers of the server for two files at the same time, everything is blocked until the requests comes back, a small but significant fraction of a second later, especially over high-latency connections (e.g. 3G or – worse – EDGE).
By using Expires on everything static (images, CSS, JS) you ensure that these won’t even exist and the browsing experience will be much smoother.
I don’t exactly use Yahoo’s rules on the subject though, they suggest putting the Expires in the far future for everything and using version numbers in the names, I usually prefer (unless I have automated the build of the website, which also happens) having a far-future Expires for images (that usually don’t need to be debugged once they’re in prod) and a nearer future for resources that may need evolution/debugging (CSS and JS)
You saved me!! I was realizing something like you wrote in this great post
The problem isn’t that a CDN will slow your site down, it’s that Coral cache isn’t a CDN.
Coral cache is a p2p caching system.
I agree that using a CDN is overkill for all but the largest websites, and shouldn’t really be on Yahoo’s yslow… but saying that it’ll slow down your website is no different than changing all your links to google cache and claiming that you’re using a CDN.
Just wanted to let you know when I tried to print Aug 15th’s article in firefox it tried to print 78009 pages. That was with the article and comments selected, and printing the print selection. Might be on this end, but I’d rather not try to duplicate this bug on my way out the door