Twitter: Service vs. Platform

On c++ vs c# - http://blogs.msdn.com/ricom/archive/2005/05/10/performance-quiz-6-chinese-english-dictionary-reader.aspx

Hi, I have a question about the chart from Code Complete 2.0. What is the Python version considered? It’s a very important factor since Python’s average speed dramatically improved with latest releases:

Python 2.3 (2003): 30% faster than 2.2
Python 2.4 (2004): 5% faster
Python 2.5 (2006): 10% faster

Source: Alex Martelli http://www.aleax.it/Python/py25.pdf

Moreover Python 2.6 and Python 3000 are in the making…

Simon,

This link sure looks like ASP.NET, it is even an aspx extension

http://browseusers.myspace.com/browse/browse.aspx?MyToken=88edcb37-903d-4f00-b741-acf2f58d550c

Yes parts of it run BlueDragon but that is still running on .NET, it is just CFML compiled to .NET

Re: MySpace, well I think Scott Guthrie knows best:

Handling 1.5 Billion Page Views Per Day Using ASP.NET 2.0

a href="http://weblogs.asp.net/scottgu/archive/2006/03/25/441074.aspx"http://weblogs.asp.net/scottgu/archive/2006/03/25/441074.aspx/a

See Erik’s post earlier in this thread - memcache (or something similar) is 90% of the answer and should be priority 1.

Optimizing queries and db schema should be priority 2.

Priority 3 - look into mcluster or another clustered solution if you have lots of concurrent writes and the 2 steps above aren’t quite getting you where you need to be. (down side === $$$)

Profiling and optimizing code execution is always good, but won’t help much with the db access problems. The execution speed of the run time or interpreter is usually not gonna be a problem - esp when throwing more front ends at it will take care of that issue.

I think your overall point: What’s more important the service or the implementation? May have gotten lost when you gave so much space to a side note on language performance.

If RoR allowed them to sieze the business opportunity, it’s hard to argue with it as an initial choice. The question is whether Twitter will become a Friendster or MySpace. Neither started with a scalable architecture, but one of them found a path to it.

I’d say the issue with Ruby is less about language speed, and more about the architectural choices it influenced which now must be revisted to handle scale.

How can people opine about performance without load testing and isolating the bottleneck first?

Why is CraigsList (PHP and mySQL I think) pretty dang fast?

Many developers have little know-how when it comes to big database design. Are hardware load balancers being used? Why focus on the development language?

Once wrote a VB5 front end to a DB2 app – every request took 1 second – small, medium, or large. It was the middle-ware. Not the P-code VB.

In very specific instances, having a garbage collector can be faster than C++.

For example, the fact that you don’t have to delete little itty bits individually.

But usually straight C++ is much faster.

I think your overall point: What’s more important the service or the implementation? May have gotten lost when you gave so much space to a side note on language performance.

I think you’re right. :slight_smile:

I am as mystified as anyone else why Alex P thinks the language matters. It just seems irrelevant, not just because the bottleneck is probably elsewhere, but because we’re comparing two interpreted / dynamic languages anyway, which aren’t known for their speed.

This comparison only makes sense if you think of language as part of the platform.

For example, consider why the Reddit folks switched from Lisp to Python. Not because the language was better (Paul Graham would probably have a coronary, and he funded Reddit), but because the platform around the language was better.

http://blog.reddit.com/2005/12/on-lisp.html
http://www.aaronsw.com/weblog/rewritingreddit

Similarly, the Rails platform seems to make assumptions about the way the database works that can make it hard for the Twitter service to scale.

It pains me to say this especially since Twitter has been so successful. The twitter case seems to be one of choosing the wrong technology for the job. Sure initial development time may have been shorter by using RoR but now (hindsight of course:) it seems that choosing another framework may have been better in the end.

“Similarly, the Rails platform seems to make assumptions about the way the database works that can make it hard for the Twitter service to scale.”

Hmm, the default configuration assumes you have one database connection per model object. That connection could be to a load balanced cluster or to sqlite. I don’t think the assumptions made by ActiveRecord are that different than those made with any other Object Relational Mapping engine.

I have to agree with those asking about the lack of data caching via memcached or MySQL clustering/load balancing, it seems like a serious oversight. Even more amazing to me is that, reading up on this, they appear to be using a hosting provider versus running their own systems.

Probably the most amazing things about Rails is that allows developers who don’t appear to know much about scaling to write apps that scale to 10000 tps.

For the guy who said that more stuff needs to be done in the database, you are just wrong. Look up how eBay has scaled J2EE (throw away everything except servlets and jdbc connection pooling, do all sorting, fk constraints, etc. outside of the partioned databases) for proof of that. In any case, this about transaction volume, not complex queries.

I think when you are talking about huge transaction loads, most frameworks have to adapt in some way. It looks like a guy has written a plugin to support the approach Twitter wanted in 75 lines of Ruby, to be followed up with two more plugins that let you add read only slave DBs by adding two lines to your database config file. In other words, the moaning by one developer has just resulted in ActiveRecord getting a capability very few existing ORM frameworks have. I’ve got to try that (moaning) next time I have a hard scaling problem at work. Easier than solving the problem myself! :wink:

Blah, blah, blah.

Jesus f’ing Christ. We’re talking like we’re experts when everybody seems to have missed the real problem, including the dorks at Twitter.

Here’s the damned problem:

The damned site is synchronous. That’s the problem. Threads are waiting for the database, and those threads are tied up until the database can respond back.

I have no idea if Ruby has an asynchronous programming model. If it does, then it’s Twitter’s own fault for totally missing the boat. If it’s not in Ruby, then Ruby sucks. Either way, somebody blew it.

Things like this make me sometimes wonder if a relational database is best for data storage. Are they really using all the features of a relational database or is it 99% “SELECT foo FROM bar WHERE userid = 12345”? If so then there must be a more efficient way for web servers in a server farm can share that data such that lookups are fast and local.

There has to be a drawback of interpreted languages. I really wonder if the designers of twitter would have used another platform if they knew that the service would get that success. Did they just use Ruby because it was the quickest way to get that thing up and runnin’?

“Hmm. 11k page views / sec = 28,512,000,000 views / mo. That’s 28.5 billion. I call bullsh*t.”

11k/sec meant the peak(s), not an average.

Hmm. 11k page views / sec = 28,512,000,000 views / mo. That’s 28.5 billion. I call bullsh*t.

java is faster provided its JIT compiled(Ie the code is optimized everytime you compile it), Infact it can be faster than the c++! Yes, I can see people eye brow’s rising? But its a fact…

Anyone tried Monorail?

It’s a rails framewwork for .Net that we have had great luck with.

It includes an ActiveRecord implementation and a nice IoC container.

http://www.castleproject.org

Thank you:

The damned site is synchronous. That’s the problem. Threads are waiting for the database,
and those threads are tied up until the database can respond back.

I have no idea if Ruby has an asynchronous programming model. If it does, then
it’s Twitter’s own fault for totally missing the boat. If it’s not in Ruby, then Ruby
sucks. Either way, somebody blew it.

I just came from a startup attempting to use rails for high amounts of traffic (with apache reverse proxy and mongrel). Ruby has primitive threading at best and RoR is NOT threadsafe. Rails has a big lock around all the db access code and no connection pooling. The ‘default’ xml parser for ruby has been described as a ‘new kind of slow’. Dunno… maybe they’ve fixed it by now.

Compared to Tomcat, jdbc pools and plain old jsp RoR is pretty bad from a performance perspective. It makes up for it with a lot of pretty nifty cache behavior given the right type of workload (low complexity db queries). But it’s nothing that a decent java framework doesn’t give you… Nothing similar to webflow, spring-mvc or the like… erb (embedded ruby) is better than jsp… but that’s not saying much is it?

RoR IS faster to throw together a two tier web app with. But the java based frameworks have a much better scalability story. And maintainability IMHO.

I think we’re going to be seeing a lot of startups hit the RoR scalability issues and attempt to work around it with db tuning, schema tuning, and various hacks to RoR… All until it becomes essentially just as complex as as java based framework.

After all… the core of the scaling issue is language independent…

There has to be a drawback of interpreted languages. I really wonder if the designers
of twitter would have used another platform if they knew that the service would get
that success. Did they just use Ruby because it was the quickest way to get that
thing up and runnin’?

Interpreted v. compiled has nothing to do with it (mostly). It’s about architecture. RoR is currently not scalable for general purpose use. For particular uses it’s just fine… even really good for low complexity - low frequency db access (Basecamp and the like). Complex/heavy queries and/or large update patterns will kill your RoR based system.

Of course this can all be fixed… java used to have the same problems. Only took a dedicated multi-billion dollar company 5 years to fix it.

I think developers have a tendency to run toward whatever the shiny silver bullet of the week shows up. RoR has the hype meter at full stop right now and people never learn.

Had they built twitter around established frameworks they’d have been much better off right now… but they wouldn’t have had as much ‘fun’… or have been as ‘cool’.