Vampires (Programmers) versus Werewolves (Sysadmins)

Kyle Brandt, a system administrator, asks Should Developers have Access to Production?


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2010/08/vampires-programmers-versus-werewolves-sysadmins.html

also, hat tip to Bess Sadler for the concept!

http://code4lib.org/conference/2010/sadler

The chasm between developers and sysadmins is really nothing more than a responsibility boundary issue (i.e., who gets to wake up at 3AM to fix something) which has grown out of proportion in way too many companies, unfortunately.

Fortunately, people are starting to notice how this can affect negatively their companies – check out the Devops movement (not associated with them, though I find their ideals really aspiring):

http://www.jedi.be/blog/2010/02/12/what-is-this-devops-thing-anyway/

Agreed. I’m a programmer by trade, but I tend to get along better with system administrator people. Weird, huh?

I also feel that people should develop a wide range of skills spanning programming, system administration and project planning. This tends to give people more empathy for the other “roles” in the company.

As a former Sys Admin, and Aspiring Developer (So that makes me Michael from Underworld? an Abomination). Why do you think you need access to production servers? and what level of access do you think you need? The werewolves are (were?) the daylight guardians of the vampires.

When both the developer and the sysAdmin are excellent at their jobs, there is no problem.

The real problem is crappy sysAdmins getting in the way of good developers or viceversa.

It was an excellent presentation, the video doesn’t seem to be linked yet on the code4lib pages but volunteers have been adding them to the Internet Archive (possibly elsewhere).

http://www.archive.org/details/Code4Lib2010VampiresVsWerewolves

Sadly looking at the video now it seems the slides didn’t turn out very well, but the slides are on the previous link.

There is apparently debate on the supernatural archtypes of each side ;).

Most of the time developers need just read access to production. This is to allow them to know that the code is really the code they expected. I have had a hard time getting even that.

Write access just leads to accusations that something was changed at 3 AM. Not having write access to production, and the separation of duties is a nice point for audit as well.

Note: I work on large financial systems, which is gamed, could pipe money to me. Not having that ability is good in that it removes suspicion that anyone is doing it.

I think you can’t be the best in either field without mastering the other, personally. The best programmers are expert systems and networking people, and the best systems people are expert programmers (not just scripters, programmers). To truly excel at either job, you need a deep understanding of the other.

Once you reach that point, it’s really just a matter of which job role you’ve signed up for (and I do believe that separation of job role concerns is important). Back to the original point of the article: Sysadmins letting developers fool around making changes in production (in non-emergency scenarios) is about as bad as developers who think making occasional “.bak” files on their hard drive is an effective form of version control. It’s really bad practice.

Production systems shouldn’t even be fooled around with by sysadmins. They should be automatically deployed and version controlled (puppet, chef, etc), as should all of the deployed code. The only reason anyone (developer or sysadmin) should be logged into a production machine is to deploy a set of tested changes via puppet, or to investigate unique production support issues (in a readonly fashion if at all possible). This is why we have dev/qa environments and test suites. Any scenario you can present which seems to justify a manual hack in production is in fact just an indication that you have more development and testing work to do (in the developer’s code, or the system’s configuration) back in your dev/qa environment.

To avoid problem I think that both of them should collaborate since the very first steps of the project. Most problems derive from the fact that developers tests software in production environment only in the final phase of the project. This lead to all sort of annoying problem, and in the worst case lead to installation of Visual STudio or similar development tool in production machine to understand what is wrong.

To minimize this, each project sould have a Continuous Integration machine that periodically deploy the project automatically in production like environment (Usually QA). If you handle a project in this way you will face less friction between developers and sysadmins, because problems arise one at a time, and there are a lot of time to resolve avoiding “manual hack”. Having a CI installed forces also developers or sysadmins to write script to automate the installation to production environments, and this greatly simplify life of everyone in the team.

I wonder how cloud computing will affect this issue.

Theoretically, if a company hosts all its custom development in a cloud, it would need less hardware, and thus, fewer sysadmins.

The cloud computing model also plays into the original question. It demonstrates that developers don’t need access to production hardware to get things done.

http://www.google.com/trends?q=vampires,+werewolves
Clearly programmers are more popular. :wink:

Here is a great presentation about developer and operations cooperation at Flickr given by John Allspaw and Paul Hammond at Velocity 2009

Slides

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

Video

http://velocityconference.blip.tv/file/2284377/

As a developer, I have a ton of respect for sysadmins who are good at their jobs. Unfortunately, I’ve worked with many sysadmins who were clueless (often the case at small businesses where the IT guy is also the sysadmin, except doesn’t know what he’s doing), in which case I think it’s necessary for me to have root access to production. When I get the chance to work with a knowledgable sysadmin, it makes my life so much easier and I’m happy to not touch their server.

I’m happy to leave the management of the box and the network to our sysadmins, though less so when I find that I’m unable to do simple tasks. My beef is with those, fortunately few at my place of work, sysadmins who seem to think that keeping the machines running is the point of the exercise and not simply a pre-requisite to the real work the system is doing (my app). “Yes, I know that those three VMs all have low CPU usage and it would be easier to manage just one rather than all three, but that’s not a good enough reason to introduce otherwise non-existent service dependencies. I’d rather that my billing service not go down just because your licensing service went wonky.”

The best sysadmins, and developers for that matter, are the ones who remember that the end user is the real customer and think about their requirements from the user’s perspective and not with a purely “machine”-centric mindset.

Another aspect which, perhaps, blurs the lines is the role of an application administrator. Application administrators often need more leeway than developers with respect to system access. Often I wear both hats – app admin and developer – on the same project and if I know my app isn’t behaving properly and I need a system reboot or IIS reset to clear it, I’d prefer to be able to do it in the situations where my sysadmin isn’t readily available. The best scenario in that case is to have the ability but also to have a defined protocol for what I will and won’t do with it and when.

All-in-all both sysadmins and developers need to learn to trust one another to do their jobs and put the customer first. I’m lucky to have a group of system (and database) admins that get this and are nearly always :slight_smile: a joy to work with.

I disagree with the developers as vampires notion. It reinforces the dangerous stereotype that programmers do better when they stay up late. Dangerous because most (read: neurotypical) programmers are better off keeping normal hours-- and all programmers should try to be well-rested, regardless of their preferred hours.

When I’ve had a really good night’s sleep, I’m at least five times as productive as normal. When I haven’t slept well, I’m more likely to add bugs than to fix them. And if I work late, there’s an event horizon after which I can save time by going to sleep.

There’s a lot of scientific research on the effects of sleep on attention/focus. Programmers would do well to keep up with it. Especially this: just as the first casualty of war is the truth, the first casualty of sleep deprivation is the part of your brain that can notice sleep deprivation.

Jeff, could you expand a little on what access Kyle has to code source (TFS/SVN/etc), by means of comparison?

Jeff, love ya man, but it seems to me you wimped out on this one. You wrote a lot of words, but concluded with there is an art to it and can’t we all just get along sentiments.

These are wishy-washy. Sure, they’re true, but so vaguely and universally true that they don’t help anyone resolve the specific issues on the ground.

I’m not trying to say you need to specify that devs always have some specific level of access to the production environment - of course situations differ and so solutions must also differ. But to really move this discussion forward, we need to start thinking algorithmic - what are the factors involved? How does increasing or decreasing factor X move us closer to or farther away from devs having full access to the prod systems?

Theoretically, if a company hosts all its custom development in a cloud, it would need less hardware, and thus, fewer sysadmins.

Sorry Todd, your theory is bad, but your conclusions are fine. I know because I’m in an org that is three years into an internal cloud system.

Systems administrators do not spend their days playing with hardware.

Now, this may be true in some humpty organizations that go cheap on the hardware, or stick their equipment in an unventilated closet next to the boiler. Most of us buy gear that will last and at least stick it in a space with a fan or two.

A component breaks, the system sends you email. You setup an appointment with the vendor, yank out the old part, shove in the new. It’s all hot-spare stuff so nobody is affected.

The days of having to obsess over hardware, having to babysit servers so they don’t melt are over.

What we do is deal with systems and how they interact with each other. Cloud does not reduce the complexity of those interactions, it makes them even more complex.

It requires a methodical mind, troubleshooting skills, a willingness to get out of one’s comfort zone, to synthesize. It requires - God help us - a system administrator who can code, for a lot of interactions are mediated by logic and scripts.

This is a change from the way things have been. Managing servers by hand, never a good idea, will be flat-out unsupportable. Guys who even try that will be fired for incompetence.

We saw a similar change over about the time that the title ‘LAN Administrator’ went extinct. People who made comfortable livings running a LAN for a department or two (Think NetWare 3) suddenly had to operate at the organization level. If they couldn’t they found employment elsewhere. Your LAN team went from a herd of guys to a small team.

As the LAN Administrator of 1998, so goes the system administrator of 2010. In the future there will be less of them, they’ll manage more ‘stuff’, they’ll be more skilled than their run-of-the-mill peer today.

No human should have access to production systems. All work should be done in development. This should be pushed to test systems. Once QA signs off on the code changes, then the code should be pushed onto the production boxes from test, with a plan to roll back if there are any issues. All changes should go through these steps, because nobody is perfect, every change should be carefully monitored. And of course if a production box goes away you should be able to just set up new hardware and push all your code onto the box in a few minutes and be up and running again.