Developing The Devop

Download ogg Download mp3

Within the hidden war inside software companies between agitated sysadmins and frustrated developers, Paul Hammond and John Allspaw from Flickr want to stop the madness. Jono Bacon and Stuart ‘Aq’ Langridge explore whether two old warring enemies can find peace and what needs to be done.

Remember, we are just the very, very start of the conversation! What do you think? Are you a sysadmin? Are you a developer? How do you think we can help code writing and server deploying friends seek peace, or don’t you think a problem exists in the first place? Do you think it can be solved or will the personalities never meet in the middle? Share your thoughts in the shot comments below…

27 Comments to “Developing The Devop”

  1. marxjohnson 23 February 2010 at 12:28 pm #

    I’m only in a small organisation, but one of our technical strengths is that the sysadmins and the developers are all under the same department. As a result of this, we work together a lot. I’m primarily a web developer, but I also chip in some knowledge when they set up servers. In return, they trust me with root access to my web servers, and let me roll out improvements and fixes myself whenever I decide they’re ready.

    I think part of it depends on your skills. I can only do what I do because I’ve been hacking PHP since I was at school and playing with Linux for years, so I have relatively broad knowledge in both areas.

    If sysadmins and developers sat down and taught each other a bit about how to do what they do, I think it would make things a lot smoother, since they’d not only be able to chip in ideas but also be aware of the issues that affect each other.

    If a sysadmin realises why it’s important for a dev to be able to install a PHP extension when they need it, and a dev realises why it’s important for you not to run sudo rm -rf /etc when you’re SSHed into your production server (almost did that last week), we’d make each other’s lives a lot easier.

  2. allspaw 23 February 2010 at 12:49 pm #

    Good work, guys!

    • sil 23 February 2010 at 12:51 pm #

      Aha, one of the two inspirations for this discussion shows up — hi, John :) I’d be interested in hearing more of your thoughts on the devops idea, since your talk last year that kicked off the discussion in a serious way.

  3. gmb 23 February 2010 at 12:49 pm #

    Speaking as a Launchpad dev, I’d say we already have a pretty good relationship with the sysadmins. Our methods for rolling out have evolved to the point where we as devs understand why such-and-such a feature can’t be rolled out right now and our sysadmins broadly understand which bits of code do what and how to deal with them when they’re misbehaving.

    I’d definitely like to learn more about how the sysadmins work, but I’m glad to say that we’re already well down the path towards the devop.

    That said, I think that there will always be a need for dedicated sysadmins and dedicated developers; I don’t think that we should try too hard to merge the two roles.

  4. Varg 23 February 2010 at 1:53 pm #

    Having worked for several large organisations, I can see why it happens. The Devs are mostly responsible ppl but there are always a few cowboys in the mix, and the Ops are beaten by the management to ensure 100% uptime. The cowboy devs don’t seem to understand that the “1 line change” they want to drop live can screw 10 million customers, usually while the devs are at home sleeping, giving the Ops a nightmare to back out changes and rollback databases before the customers start ranting. I’m love to think these attitudes could be sorted, but I just don’t think it can happen in a large enterprise environment.

    • marxjohnson 23 February 2010 at 2:06 pm #

      Do you think it’s affected by whether your users are paying customers?

      I work in education, and while it’s an inconvenience if our VLE system (that I work on) goes down, lessons can still continue.

      I can see sysadmins being a bit more hard line (or at least, being made to be) when downtime affects income in real time.

      This could be down to a management issue as much as anything. If management’s attitude to downtime is to shout at the sysadmins when it’s not their fault, they’re going to get tough with the people whose fault it was. If their attitude focuses on diagnosing the cause of the problem and preventing it occurring again, then the person responsible might find themselves better educated as to the consequences of their careless actions.

  5. Varg 23 February 2010 at 1:55 pm #

    Forgot to add, 10 code drops a day? OMFG, the Change Management Group at my last place would have either simultaniously had heart attacks or just quit…

  6. No' 23 February 2010 at 2:21 pm #

    What I can’t stand is that when (as a dev.) I’m asking for a specific library to be installed on a server, the only answer a regular sysadmin is an other question: “what is it for?”. Even when I did explain in my first request.

    As far as I can tell, I’ve assumed (humbly) the position of sysadmin on a dedicated server for some time, as long as being a developer. I would never ask for a specific thing on a server if it wasn’t necessary. Why would sysadmin always be suspicious when a dev is asking for something?

    (but apart from that, I think I understand quite clearly when a sysadmin says: “not stable enough” or “there is no package, we won’t install it” or “fuck off, I ain’t got time”. And my hosting provider is no nice I wish I could buy him a beer IRL)

    • No' 23 February 2010 at 3:20 pm #

      mmmm… “so nice”, not “no nice”

    • mibus 24 February 2010 at 10:56 am #

      “Why would sysadmin always be suspicious when a dev is asking for something?”

      Because otherwise, we’d be inundated with “could we please”s for every new shiny extension/library/language :)

      I’m an ex-developer (>5yrs) turned sysadmin, and it’s given me a great basis for getting on well with the softdev types at my company. I try to be accommodating, and usually it works out quite well.

      Unfortunately, it’s often up to the sysops to act as guardians and architects of the entire suite of servers – not just of mechanics to keep them with working disks. Perhaps your sysop thinks there might be another library/extension that could do the job, that’s already in use. Perhaps they want to understand in a bit more detail what it is and why, to see any potential security or performance implications. There are dozens of reasons that I’ll go back and ask a developer for more information like that.

      Especially when it’s a production system, and I’m going to be up past midnight to make the change (in case it goes bad, we don’t usually want a production server offline during business hours! :)

  7. mikedanko 23 February 2010 at 4:23 pm #

    The whole concept of what this conversation is about is flawed. There is no disconnect between devs and admins, there’s a disconnect between qa and management.

    I’m more than happy to deploy patches should they have gone through QA properly. I’m more than happy to provide environments to devs and qa people for testing.

    However, if a manager tells me “WE NEED X FEATURE NOW”, there’s the disconnect. Devs get the shaft if something gets missed in testing more than a sysadmin does, and this is where it happens.

    There’s no lack of awesome developers, there is a slight lack of professional sysadmins these days, but there’s a black hole where proper management needs to be because they’re mostly spending their time on non-engineering issues.

  8. nbennette 23 February 2010 at 4:32 pm #

    I think the quickest way for the two groups to understand each others world it so switch jobs for an extended period of time. That’s the “dreamer” side of me. I’ve worked in ops & run small ops teams and it has truly given me an appreciation of the job they do and the lack of thanks that is sometimes shown by non-ops teams, both development and management.

    The other thing that has worked well for most of my career, which has been on the development side of the equations, is engaging them on a personal level and letting them know that I will do everything in my power to make their jobs as easy as possible. If someone from ops walks into my office needing support with something that me or my team has built, they almost always given the highest priority. This level of respect given to their concerns goes a long way with most of the sysadmins I’ve worked with and goes a long way when we have an errant deployment.

    The other thing that really seems to work well for me is taking time to explain/train them on what me and my teams are deploying into our collective environment. I’ve architected solutions in both large and small organizations as contractor and an employee, and the smoothest deployments/transactions come with I spend time with the sysadmins explaining what I’m deploying, why I’m deploying it and with the expected impact to the environment will be. And doing in such a way that, they understand and know that I’m going to be here if something goes wrong.

  9. Amar 23 February 2010 at 4:36 pm #

    Surprised there hasn’t been a reference to the recent xkcd [http://xkcd.com/705/]

    One company that I had links with had all developers and sysadmins down as ‘opps’, worked in the same office and everyone got the phone call when things broke in the middle of the night.

    Not sure how well that would scale.

    • Paul Hammond 24 February 2010 at 3:55 am #

      From what I hear, both Amazon and Facebook have developers carrying pagers for production issues in their code.

      Flickr’s development team don’t carry pagers, but are the first to be called if the ops team can’t debug and resolve an issue by themselves. This gives us a pretty good incentive to document our code and make sure ops know how to support it.

      • sil 24 February 2010 at 11:24 am #

        Heh. Yeah. Motivation by 3am-wakeups…

  10. jono 23 February 2010 at 4:57 pm #

    By the way, here is the original link to the Yahoo! Flickr presentation:

    http://www.scribd.com/doc/16877392/10-Deploys-Per-Day-Dev-and-Ops-Cooperation-at-Flickr

    Oh, and confused to move the slides on? So were we: click to the right and left side of the slide image. Bloody, scribd…

  11. DanL 23 February 2010 at 5:17 pm #

    Great job gents. I think the philosophy of Developers being more comfortable with “failure” where Ops is not is a bit broad.

    $software_development == $process_evolution;

    Sys Ops needs to be the constant in the equation. Things have to break in the development process, to change/evolve business processes, even it is not known if it is for the better (Darwinism meets Random Evolution at its core).

    That is the Developer way to push the envelope. Ops needs to be the gatekeeper of the fundamentals (like the bio-chem mechanics of the genome i.e. the rules, and not the programming of the genome, i.e. random/non-random mutation). Sys Ops make the rules, developers try to bend and break them. That is what makes things progress and fork.

  12. spuklo 23 February 2010 at 9:18 pm #

    I think that a bin under-emphasised group here are QA testers. They should be a bridge between developers and sysadmins. If they say “it’s good, it meets the criteria”, sysadmins should have no problem deploying the change. Obvious question is how QA gets to the point where they say “GO!” – I think it’s sysadmins input to testing procedures that QA systems should be as obscure, as small and secured as production are. They know all the tricks of operating systems, all the small details in security settings that may cause disaster if application is poorly written. On the other hand they want to keep set of software on servers as small as possible. It’s called “reducing attack surface” and I think it’s more than fair approach. On the other hand sysadmins don’t have “a big picture” (it’s a bad word but I don’t know better), what would allow them to look at things differently than just “it’s another piece of crap on my machine”. And this is a part where developers may help testers to better understand why they need particular library, etc.

    I really see QA teams as bridges, connecting those 2 worlds. I don’t think that switching roles is a good idea – sysadmins are not developers and developers are not sysadmins. What have worked for me, when I was a team leader of QA team, was to have a mix of developers and sysadmins working as testers. Not helping testers but working as testers. The had a knowledge from both domains, argued hugely but result was astonishing. They have provided a dev team with incredibly well argued defect reports and on the other hand, they have provided real sysadmins with well justified needs to have something installed.

  13. Martin 23 February 2010 at 11:04 pm #

    Thanks for this Shot Jono and Aq, I appreciated it very much, as a Sys Admin I can understand where you’re coming from but as a Part Time (or wanna be Developer) I can also appreciate the over view point, I found this shot extremely interesting. Take care both of you, Martin.

  14. b1ackcr0w 24 February 2010 at 12:06 pm #

    I’d like to chip in and mention how helpful ITIL can be in helping this issue. If you haven’t encountered it yet, ITIL (IT Infrastructure Library) is a comprehensive catalogue and system of IT industry best practice. Although it was originally aimed at improving UK government IT service delivery, it has evolved into a more general industry tool. It does have weaknesses and pitfalls. But if approached with the sincere aim of improving how IT is Planned, Delivered and Maintained, it’s a very powerful tool for change and benefit. The reason I mention it here, is that it does have a lot to say about how to systematically include all the members of the development and delivery chain. Well worth looking into for anybody who has to deliver any kind of IT service.

  15. John C 24 February 2010 at 8:40 pm #

    Is there really a widespread animosity / distrust / frsutration between devs and sysadmins? I’m a developer and I’ve not encountered this problem – everywhere I’ve worked, the devs and admins mix quite happily and work together without a problem. I guess I’ve been really lucky.

  16. richc 24 February 2010 at 10:17 pm #

    Testing, peer reviews and rapid, small releases can help increase confidence that a developers code will not cause problems. But, there will come a time when a major change, either in code or in infrastructure, needs to be made and this is where group and personal responsibility kicks in. If both sysadmins and developers feel that the other can be trusted to take responsibility for problems then a level of understanding of problems is possible. I’ve most often found this in small companies and almost never in larger ones.

  17. Shane Fagan 26 February 2010 at 2:35 am #

    Well im a developer and ive never had the pleasure of dealing with sysadmins but id say when I get out of college that will change quickly. I like the idea of small rollouts like flicker myself. Its interesting to me but I really dont have a clue yet :D

  18. [...] the topic of developer/system administrator relations, inspired by the recent Shot of Jaq episode, Developing the Devop. If you haven’t heard of Jono and Aq’s latest Internet media adventure yet, then you [...]

  19. Dave 9 March 2010 at 11:44 pm #

    We covered this a bit in our podunk Irish college tech radio show. We found a Devop (specifically a guy who works as a liaison between the Dev team and the Admin team) for a computer game middleware firm, Deamonware.

    He pointed the finger at Dev’s writing poor reporting tools and being out of touch with an admin workflow.

    The show can be found here http://ian.ie/741

  20. Tony Whitmore 6 April 2010 at 3:48 pm #

    This is more of a management issue, as Jono intimated, than a technical one. If developers were held as responsible for the availability of the relevant services as the sysops, it would help ensure everyone was working towards the same end. Similarly, if sysops were measured against the time taken to implement new features.

    It’s fairly classic “siloed” working and is a big problem in lots of companies. For example, would most of the performance or security issues be prevented if the sysops were working with the developers prior to releasing the code? I’d hope so.


Leave a Reply