The Cloud is Just Someone Else's Computer

codinghorror · February 18, 2019, 2:59am

I used to be afraid of that too, but we started deploying Discourse instances on Docker containers at scale in 2014 – as an open source project free to anyone and everyone – and those servers have been remarkably stable over the years. I even upgraded Ubuntu Server versions with little incident.

(I know this because I personally installed and supported around ~325 of them as our early $99 Discourse install plan which ran from 2014 - 2016. We don’t do that any more, but you can head to https://www.literatecomputing.com/ who does )

kkztno · February 18, 2019, 5:21am

Recommendations on memory and storage for self-purchase and installation in the Partaker B18? Compatibility specs are a bit unclear on the websites. I’m going to try this fun experiment. Looking for 32gb memory and 1tb nvme.

Your blog says B19 but the sites say B18. Minor typo?

CumpSherman · February 18, 2019, 5:36am

I run IT for a small non-profit with big computing needs. I inherited a setup consisting of a boatload of colocated ‘ancient’ servers and network gear, alongside a corporate attitude that we would naturally migrate everything to the cloud once contracts/warranties were up. We had already started putting new systems into AWS piecemeal, so it felt inevitable. The colo bills seemed frightening, as did the software licensing costs per core, etc. I was no stranger to bitching about these myself.
Then I decided to run the numbers, due diligence, you know, of a COMPLETE hardware replacement, colocated, financed over 3 years, versus the required cloud servers to do the same job. Er… Absolutely no contest. We could buy and colocate all the hardware we needed for those first 3 years for the cost of 1 (one) year in the cloud. And in years 4+, we’d have no hardware cost (above support contracts). Now, admittedly, by then we’d be running on ‘old’ hardware, but we’re vanishingly close to 100% uptime and hitting performance goals on kit from when God was a boy, so this stuff would be around for a long time, living in its perfectly air conditioned, power smoothed environment.
And, y’know, no money, so we’re used to making do.
So, much as I would have loved to be able to put all our stuff in a crusher, there was no viable case I could make. The numbers were simply overwhelmingly stacked towards our own kit. Once it hits you that there are 8765 hours in a year, and you’re running 200 VMs, that colo bill turned into a per VM hour figure? Forget about it.

codinghorror · February 18, 2019, 6:34am

Ah sorry about that, fixed the typo.

Any DDR4 SO-DIMM 2400 should be fine, and any M.2 NVMe PCIe drive. I went with Corsair Vengeance 32GB and Samsung 970 Pro 512GB.

I tend to agree that you want to spec up on boxes you plan to have in colocation service for 3 years. I’ve never regretted going for the slightly bigger / better boxes in these roles, as they tend to age better when you do. Note that 64GB isn’t possible, I looked. Nobody makes DDR4 SO-DIMMs in 32GB capacity quite yet

(That said if you do want to save a few $, I’d possibly go with the 970 Evo series SSDs, and maybe the 4 core / 8 thread CPU in the B18. Those are viable tradeoffs.)

Let us know how your experiment goes. Reply here, I will definitely see it, even years later. I’m in this kind of stuff for the long haul.

snowcrash · February 18, 2019, 7:09am

Just curious, but has anyone here tried to run a globally distributed set of services which need 3 nines of availability in multiple geographies and be able to hand peak qps of between 1000 and 10,000 on a couple of PC’s you bought from new egg?

I always find these comparisons between the cost of the cloud vs DIY approaches kind of hilarious.

codinghorror · February 18, 2019, 7:28am

For nominal meteor strike protection, and true global availability you definitely need the no question.

The funny thing is, today’s hardware is so crazy fast – particularly if you’re not paying Intel’s “Xeon Platinum” enterprise tax – that these kinds of super high throughput numbers are kinda not that tough to hit on just a few colocated boxes like the one in this blog post.

snowcrash · February 18, 2019, 7:55am

My point is… the cloud is not built for a couple of guys in their basement. It’s build for large orgs who need multiple levels of redundancy, have to deal with all sorts of data sovereignty and HIPAA,GDPR (etc) compliance issues and quite frankly aren’t in the business of operating servers because they’re a shoe manufacturer or an insurance company. Whilst I don’t doubt that a single hardcore server can crank out some QPS, when you start adding globally consistent data stores, external APIs or dependent systems etc you quickly realize that you’re actually I/O bound and the fattest machine on the planet isn’t going to help unless you can scale up to a few thousand instances of your app in a few seconds, then scale back down to nothing so you’re not paying out the ass for machines you’re not using.

And don’t get me started on security. Sure you can run your own kit and save some pennies, until you’re pwned by the next linux kernel vulnerability and you have to take your entire system down to patch the OS, and I’m sure it’s fine that you just run the OSS version of every single cloud service you’d want and that it won’t have any security holes that mean your customers personal information ends up being downloaded by your second cousin using a Russian VPN and a raspberry pi. I mean, how hard is it to run a messaging bus, database, search index, map reduce cluster, index service, logs analysis and data storage system, all securely and using a common set of control credentials. Should be pretty easy right?

Comparing the cost of a server you buy against the cost of a cloud provider is nonsensical unless you plan to actually measure it correctly, and when you do generally you’ll find the cloud is the biggest bargain out there.

</end_rant>

codinghorror · February 18, 2019, 8:20am

A few thousand? I don’t think so. As an example, Stack Overflow, which I co-founded, ran on 25 servers in 2014:

http://highscalability.com/blog/2014/7/21/stackoverflow-update-560m-pageviews-a-month-25-servers-and-i.html

Servers, drives, and CPUs have all gotten larger and faster since then, for basically the same prices.

As long as you take OS defaults, use the auto-update mechanism built into the OS, and reboot every 6 months or so, you’re fine. Remember those ~325 Discourse $99 cloud installations from 2014-2016 I referenced, above? Guess how many got p0wned with security issues? Zero. I’ve lived this test (literally, as I was personally responsible for all those installs) and I’m comfortable with what the real world data showed me in that two year period.

Note that I’m definitely not arguing Stack Overflow should switch to these boxes in the blog post. That’d be ridiculous, of course! But it is downright amazing how far you can get with a few basic modern colocated boxes today.

snowcrash · February 18, 2019, 8:40am

Fair point, machines are faster now no doubt, but I’ll still maintain that the title of this post is kind of misleading. Look, I don’t care, and if you can run your site on less than a cloud provider charges then more power to you, but I think it’s a little unfair to compare the raw cost of a machine with the cost of a VM (or other compute form) in a public cloud. Full respect for stackoverflow, but number of servers and amount of downtime and not necessarily equivalent (also 560M per month is really only about 200qps and presumably much of it was cached at a CDN, so maybe half that?). As may be apparent from my comments, I work for a major cloud provider and we have lots and lots of customers who routinely run 1000’s of QPS with peaks in the 10’s of 1000’s, and while they could no doubt do that on a few hundred machines, it makes 0 sense for them to do that themselves given all the limitations and risks I mentioned. Stackoverflow is a great example because it’s NOT a shoe company, but 90%+ of businesses haven’t a hope of running at scale with a DIY approach. Now that said, your average shoe company is not running 1000’s of QPS either, so your point is valid, although some of us believe that if they’re not eventually software companies then they won’t be shoe companies for much longer.

Ksec · February 18, 2019, 11:34am

Yes. I could only wish Discourse get anywhere close to that. Even just 50% of the load with 100% more Resources would be quite an achievement.

codinghorror · February 18, 2019, 12:05pm

We host thousands of Discourse sites on maybe ~40-50 servers at the moment. So the ratio isn’t that far off, but it is true the .net framework is extremely efficient in ways that Ruby + Rails is … uhhh… not? Postgres has gotten quite competitive with SQL Server in the last decade, as well, so that’s good news, particularly in the “if you have to ask how much the licensing of SQL Server is gonna cost, you can’t afford it” department

Our yearly software licensing bill is basically $0, though we do give back when we can. I can assure you Stack Overflow’s yearly software licensing bill is more on the order of $100,000 … perhaps a whole lot more. It wouldn’t surprise me if it was closer to $250k.

sunk818 · February 18, 2019, 5:21pm

Other than the convenience of managed servers, the only convincing argument I’ve conceded to for cloud is being able to scale up resources (CPU, RAM, storage, or bandwidth). If you run an ecommerce (black friday), slashdot effect (sudden web traffic), or USGS after a major earthquake.

For static load, I’ve used baremetal servers for years on online.net (now scaleway) for under $20/mo. Hetzner auction for i7 servers around $25/mo. OVH/kimsufi for personal dediservers (low end CPU or storage ARM server) for $4-10/mo. For some reason, Europe seems to have the best deals on VPS & baremetal. Granted, they aren’t RackSpace, but I’m not paying RackSpace prices either.

For personal servers, I’ve contemplated signing up for dynamic DNS on Hurricane Electric. Then, I could potentially run a server at home with port forwarding. I’d have to manage the uptime and backups myself, but I’d only pay for hardware upfront and ongoing electricity.

_maxking · February 18, 2019, 5:23pm

I really like ASRock Deskmini 110 which comes with a LGA 1151 socket and hence allows you to put in a Desktop CPU, whichever one you choose. The size is small enough to be a mini scooter pc I guess

obijan42 · February 19, 2019, 5:47pm

I think somebody totally missed the “cloud” train.

Where are your actual application requirements?
Where are the bottlenecks? Are you IO or RAM or CPU bound? How does this scale with the number of users?

“modern, fast” and “speedy” are not serious requirements.
“a dedicated virtual private server” : That’s a contradiction

The extreme hardware geeking is fine if that is what you want to do with your life, but it has nothing to do with cost/performance optimizing an application to run on premises / in cloud / hybrid.

Here’s how you do it in the real world:

Build test-cases for your application (including performance indicators)
Create a load simulator that can imitate X number of users
Deploy on a variety of platforms / combinations
Stress the application, monitoring behaviour. Increase the load till it cries. Note which part hurt the most (memory, network, CPU,…)
Pick the infrastructure that gives you most bang for the buck.

If you are really looking for the cheapest solution for you, you should not care what the physical/virtual infrastructure is.

Starting off with “Hey, I need this particular Mac hardware” is the opposite of the best practice.

watertankk · February 20, 2019, 4:48pm

What was your reasoning behind the intel processor and not a competitor like AMD or even an ARM based chip?

tristan_juricek · February 23, 2019, 6:29pm

When I see things like “The cloud is just someone else’s computer” I cringe a little bit, because it isn’t. It’s someone else’s computer run by a very good operations team, with services run by thousands of developers learning edge cases, and building an ecosystem of distributed applications that handle tons of automatic scaling.

Pretty powerful stuff… if you can pay for it and don’t mind sharecropping off someone else’s land.

The risk of the cloud is the One Vendor To Rule Them All. It’s just a new version of commercial software, but an even more evil one, because these new commercial vendors embrace and extend open source projects. I’m sure Redis and MongoDB just love AWS right now with ElasticCache and DocumentDB. This is the thing that drives me nuts about the cloud. It isn’t that you’re somewhat at risk to AWS “crushing you” by cranking up costs randomly. It’s that AWS might be killing software freedom. So your ability to even colocate using open source products is in doubt, because Amazon may be hurting the ability for the open source business ecosystem to thrive.

This is why I’d like to see a little less flippant comments about what the cloud being this simple “VMs on steriods argument”, because it is not. Cloud services on AWS, Azure, or GCP are very easy to use, powerful, and depending on use cases, generally aren’t much more expensive. But, your approach of colocating a server… just has so much less institutional risk. I’d really like to see it thrive as an option, because I want the option in the future to negotiate with places like Redis, Elasticsearch, or MongoDB with support contracts, instead of all my money going to a massive company like Amazon or Google.

codinghorror · February 24, 2019, 11:56pm

This is stated in the first lines of the blog post: 1GB RAM, fast dual core CPU, solid state drive with 20GB+. That’s an absolute minimum, though. Depending on the size of the community you’ll need substantially more. Database size is a significant factor; ideally the whole DB should fit in RAM, at least the indexes and a big chunk of cache, so if you have a 16GB DB…

It is fair to note that if you only need a $20/month VPS – this gets you 4GB RAM, 2 vCPUs, 80GB disk at current pricing – it’d take 50 months (over 4 years!) to recoup that initial $1000 investment in the hardware.

It would be very cool to have Ryzen in this NUC form factor! Does anyone offer it to your knowledge?

Obviously I agree, though I completely understand why people use cloud services because of the incredible flexibility (literally push a button on your screen and it happens), plus cloud prices have declined somewhat over time. What people tend to overestimate is the difficulty of colocating, and the risk of hardware failure. Today’s hardware is extremely reliable, largely due to advances in power supplies and SSDs. And 90% of reliability concerns can be mediated by simply dumping more super cheap, super reliable hardware in the rack – which gives you even more capacity per dollar as well while reducing your overall risk at the same time.

I think it is dangerous for colocation options slowly to die off over time, so that all server hardware eventually belongs to “the big five clouds”

picofarad · February 25, 2019, 4:58am

In 2011 i was in charge of testing out new Dell R820 servers that cost about $35k each. One of the proof of concepts we ran at my suggestion was taking our 4000QPS postgres database and forcing it to run in 640KB of ram. The machines could do it, as long as you were able to first sync the entire database into tmpfs. I believe the database was around 180GB at that time, and this was a top 500 site on alexa (at that time)

The main thing i learned is that QPS has a lot more to do with filesystem bandwidth than anything else, and the second thing is that while my artificial constraints on postgres were cute, it increased the amount of slow queries by at least an order of magnitude.

The Partaker B18 would absolutely run circles around those Dell R820 for 95% of all use cases. I have a system that’s nearly identical to those old R820s (40 xeon HT cores, 192GB of ram, 8x10gbit NIC, SSDs, etc) and i really only use it for workloads that are either embarrassingly parallel, or those which would require more than about 15 minutes to run on my desktop, but are too large to practically get out to external servers like the cloud. My desktop specifications are vastly weaker in core count (2) than the partaker B18, although my single threaded speeds are probably faster. All of this is to say, the “cloud” isn’t a one size fits all prospect. Certainly newer hardware beats older hardware, especially since 7th generation Intel processors. And i just found out i can colocate for about 33% less than codinghorror, so now i am seriously considering just putting a few of these B18s together and ditching AWS altogether.

gmile · February 28, 2019, 7:03am

That’s $2,044 for three years of hosting

That’s a typo, right? Should be $1,044?

codinghorror · February 28, 2019, 7:45am

It is an all in cost.

$1000 for the computer plus $29 x 12 x 3 = $1,044.

$1,044 + $1,000 = $2,044