Twitter: Service vs. Platform

Twitter is a victim of its own success. The site has massive scaling problems, to the tune of 11,000 pageviews per second. According to this interview with a Twitter developer, a lot of the scaling problems are attributable to Twitter's choice of platform:


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2007/04/twitter-service-vs-platform.html

Maybe Twitter is already taking this advice to heart:

http://twitter.com/blog/2007/03/now-hiring-senior-engineer.html

I’m betting people on the Java side of the Java vs Ruby debate will jump all over this and a name-calling-palooza will erupt over at places like The Server Side.

It’ll be interesting to follow this. I love the concepts and ease of programming that Ruby and Rails give you, but when you look at the total cost of ownership over the life of an application, programmer time isn’t always the most expensive part of that. Ask somebody trying to run an enterprise application on .NET right about now about how much downtime they have to schedule in to deal with the endless security patching. That adds up no matter how cool Microsoft makes the IDE.

Same deal here, essentially. My sys/admins and upper management types at the Fortune 10 I work for are scared to pieces of change. They think of their JVM as reliable, tested over a number of years, and scalable. No way Ruby gets past an architecture review in its current form. I’m really hoping that either Grails or JRuby gets some more traction so we can have our JVM cake and eat our programmer productivity tools too.

jeremiah,
You notice that assembly is not on the list, right?
The reason is that C++ compilers are better than humans in understanding the CPU.
In the same manner, the JIT is good enough to produce code that competes head to head with C++ code.

Python is interpreted, but it’s faster than Ruby because it’s compiled to bytecode first. Ruby doesn’t yet have a bytecode compiler, but it’s coming - I think within a year.

This article isn’t up to your usual standards of care on the details. In fact it’s really quite horrendous. I respect Code Complete as much as the next person, but I shudder to think of the uninformed person learning from this or the informed person quoting this disingenuously.

First I must say, I lately mostly code in Java, spend a lot of time in Perl, and 2 years ago rejected Ruby for a performance intensive project due to its failure at the metrics we wanted to achieve. I try my hardest not to be biased, but you know how that goes. But hey, I try.

  1. The obvious: “interpreted” code typically goes through multiple compilation processes into the psuedo-vm that will eventually run the psuedo-byte code. It isn’t all that freaky different from C# and Java presuming that it’s a well honed interpreter.

  2. Also obvious: Java and C# only perform as fast as C++ in certain metrics. Any metric that broadly states that they are 1:1 is clearly (a) flawed, (b) biased, or © misquoted. Interestingly, they can both beat even C in some tests due to the very simple fact that using a very well written library which is commonly available in those languages, as opposed to rolling you own, is a great way to optimize '-). Surprisingly enough there are some tests where Perl will come close.

And that’s the real point here. And the point that this article misses. I know it’s annoying to keep stating the obvious, but performance test are about the most flawed and difficult thing that can be done with languages, and it’s just our lot in life to keep admitting that fact every time we make them or quote them.

  1. Everyone who isn’t a fanboy knows that the Ruby interpreter doesn’t win prizes for speed. The bottom line is that generally, yes, Python is significantly faster. It’s not a big deal. I dislike Python syntax. I rarely code in Python. I am annoyed by Ubuntu, which I otherwise love, due to its Python bent. I like Ruby syntax and ideas a whole lot more. But objectively, I know that Python is faster, because of direct experience with (a) scripts I’ve made, (b) apps I’ve run, and © others’ trusted statements.

I admit that I personally don’t particularly like the Rails tidal wave. But it’s mostly not for language reasons and more the effect of the way too many emotionally charged decisions being made due to popularity and initial coding ease as opposed to fully objective analysis for each target project. But you can’t argue with success even if it is Flickr dumbing down its features supposedly because of performance (a lot of that’s on PHP), or Twitter’s performance sucking as they realize that script-generated database code isn’t cutting it for them.

But hey, this isn’t new. MS came out with the VB years ago with its own comparable set of trade-offs (donning flame-retardant suit), and I’m sure none of us enjoy the remembrance of all those horrid yet supposedly useful little shareware apps that made you download the VB .dlls. (That’s the real reason people originally complained so much about the .NET libs, those repressed memories of VB DLLs.)

And the objective conclusion is, they have been working on Ruby speed for years, and it still aint there yet. One day, I’m sure (not being sarcastic here).

In any case, I love your blog, but yeah, c# a 1:1 with c++? heh.

I don’t think this is a “Rails problem” I’d like to have seen a ASP.NET VB.NET app running Twitter. I bet you’d have similar (if not the same problems) that they are having. The language speed comparison you’ve shown has minimal impact. So I don’t know what the point of having there is. You’ve forgotten to talk about the web server, the database server, the hardware and network setups. That’s where obviously the bottlenecks lay. But to make it a language dispute, doesn’t make sense. So what if PHP is 100 times slower than others? It happily is used by very many web sites that handle very large traffic and loads.

What I find particularly amusing is the performance comparison with Python. It’s hard to believe that Python is that much faster than Ruby. Python, like Ruby, is an interpreted language, and interpreted languages are so slow that if you have to ask how much performance you’re giving up, you can’t afford it.

Actually, Python is byte-compiled, just like Java or C#. The reason it’s slower than C#/Java is because it’s dynamically typed. Dynamically typed languages will always be slower than statically typed.

Python has about the same performance advantage over Ruby as C++ does over C#/Java - 5x.

Python vs. Ruby:
http://shootout.alioth.debian.org/gp4/benchmark.php?test=alllang=pythonlang2=ruby

C++ vs. C#:
http://shootout.alioth.debian.org/gp4/benchmark.php?test=alllang=gpplang2=csharp

“C# is byte code, just like Java is. And its not as fast as C++, either. Like Java, its very close, but its not as fast overall as C++.”

C# is actually FASTER that C++ in many benchmarks I’ve run, and if it is slower, it’s only by very small ammounts. The explaination is that C# is bytecode on the disk, but it IS compiled to machine code on-the-fly, and the compilation only happens once (unless there is a change, obviously). With this on-the-fly compilation, the code is actually optimized for the SPECIFIC processor that the system is running, as opposed to the C++ method of general 386/Pentium optimization, as is the most common.

Alex expands on his comments in a discussion on DHH’s blog here: http://www.loudthinking.com/arc/000608.html.

For those arguing about C# vs C++, again, would you PLEASE stop pretending it’s so clear-cut. There’s no “end of story”.

See http://www.codinghorror.com/blog/archives/000299.html for some REAL discussion.

I find the interesting bottleneck the inability to “talk to more than one database at a time”. You can optimise the hell out of your code, reverse some of your ruby wizardry and clean code for performance optimised magic, but you will still keep falling at this hurdle.

The “Random Reader” beat me to the link. Noone pretends that that example covers all cases, but it shows certainly one case where C# outperforms C++ without a lot of hard work that few are willing to make.

@diego - You know that MySpace runs on ASP.NET, right? They’re probably around 2 billion page hits a day by now.”

@jon, it wasn’t my intention to suggest that ASP.NET couldn’t handle large loads. What I meant was that it’s about other factors, like database, web server, server farm setups etc. So whether you have your app in Ruby or ASP.NET then you don’t have an easy ride with either when it comes to handling large loads. It’s something that takes work and has to be planned. I wasn’t knocking ASP.NET :slight_smile: That also goes in the other suggestion when people outright suggest that Rails can’t handle large amounts of traffic.

How hard would the specific features he mentions be to port to another platform? I’ve had performance issues with Python in the past, but one nice thing about Python is that it can interface with compiled code fairly painlessly.

Amidst this “my language is faster” flame, it’d be good to remember that performance issues are best left until after the first iteration of development. Once you’ve gotten that far then it usually becomes pretty clear where the performance bottlenecks are, that old adage “___ % of your execution time is in ___ % of your code”, the 80/20 rule, blah blah blah. Who’s to say whether Twitter would have ever materialized had they started out with a platform as cumbersome to develop as C++.

Also, there’s no way I believe that Java is only 50% slower than C++. That’s gotta be a typo.

Interesting debate but a little flawed on some aspects.

Firstly, this is about web applications which Twitter is (if its not, please explain why not!).

Comparing C++ into the equation is a little short sighted and those people who are defending C++, you really have no argument here as when was the last time someone created an entire web application in C++?

If they did then god help them as its not an area that C++ is designed for. C++ in certain environments, such as technical or scientific areas is the king and there is no doubt about it but it comes down to business requirements.

End of the day as a business you will want to bring an application to market as quickly as possible, hence the need for middle to high level languages, which is where PHP, C#, Java, Ruby and to some extents the more esotoric brands like perl and python fall into. The argument about performance is a bit too vague as again, you have so many factors to consider e.g. is the language in question being tested against the development of an Web Application or a Desktop/Embedded application?

Raw compiling charts don’t mean anything except to the neurotically charged induviduals who care only about these facts.

What has been highlighted is the state of play of the Web Application domian. There are still many areas where Web Application design and development is limited by the technology available however this also offer a unique challenge where the designers have to architect a solution based on the limited resources (including scalability). Developing say a desktop application is not a big deal anymore, end of the day if its far quicker and easier to develop a C#/.NET based application than a C++ equivelant then I am afraid I would hire a C# developer - the average computer is incredibly powerful and as such most coders don’t have to worry about things like performance as they know the system will offer enough bang to cover the limitations.

What is important is how web development proceeds in the future. Its obvious RoR has limitations that Twitter has encountered and it would be good to see the dev’s taking this oppertunity to expand out of the box and develop the design further to handle other database systems and improving the language overall.

But as someone mentioned, most preformance issues in a web environment are to do with networking (internet speeds, hardware), server platforms and the client system that is being served.

We are essentially running systems that are incredibly powerful down a network that is barely capable of handling the demand, until this area improves we will always have issues that are outside of the control of the coding language of choice.

The only thing that I can think of for why Code Complete says that Java is Byte Code where C# is compiled might have something to do with the fact that Java was originally compiled to byte code and then that byte code was interpreted during execution. Things have changed and that byte code is now JIT compiled just like C#'s MSIL is JIT compiled at program startup.

this is why i don’t believe in ‘one does everything’ monster framework. Too much abstraction is bad programming in the end because now you have a slow pig website. there has to be a somewhat personal, low-level approach to certain tasks. and alot of them suck to do. but that’s the nature of this work. suck it up and code. -1 for RoR.

As somebody who began life on 8 Bit CPU . . .

ITS THE ALGORITHM STUPID !

This can be the hardest thing to admit, that the fundamental design is flawed . . .

While I’m not super familiar with the specifics behind twitter, I did write most of the code behind another very large Rails site (www.penny-arcade.com) and I am fairly familiar with the performance of the Rails stack. I realize that we have vastly different data access patterns, but in my experience with Ruby scaling issues can be broken down and identified fairly quickly by simply benchmarking different parts of the application (usually through unit tests).

It sounds like their bottleneck is at the DB level. While it isn’t a magic bullet, I failed to see the word memcache mentioned anywhere in the articles I’ve seen on twitter. Looking at their access patterns memcache seems like it would make a lot of sense, I’d venture to guess 80% of their traffic hits only the most recent data (5%-10% of their data). In my experience memcached can really help improve scaling of a site that is DB bound. It probably isn’t feasible to implement page caching, or even action caching for most of their pages, but I think using a memcached cluster for fragment caching will save them a lot of db ops.

Twitter is not the first social networking site to run into this problem. Facebook runs PHP (comparable to Ruby) and they switched long ago to using memcached to keep their hot data ready to serve to the client. Livejournal, Slashdot, WikiPedia, and even SourceForge use memcached to prevent their database servers from getting overwhelmed. I would have preferred to see a post about the importance of caching in web applications, rather than restating that compiled languages will out perform scripting languages.

So performance doesn’t matter, except for when it does. That sounds about right. The good news about this is that there are lots of ways to optimize Ruby (and Rails). I’ve seen scripts where you can embed C code into your Ruby code and a parser will compile the C code and create a loadable module and modify the Ruby code to call that C function. Optimizations like this though are by nature quite esoteric.

The real question to ask is was the work to develop the site in RoR plus the time to optimize RoR for the massive increase in traffic greater than the work would have been to develop the site in C++ (or similar) plus the time to optimize the C++ site for the massive increase in traffic.

Also, I’m suprised they can’t put the database on a cluster where incoming database connections get routed to one of a cluster of database machines. There may be some obscure technical reason why that is difficult, or maybe that is the route they are planning on taking.