Central and Unique Platform teams
DevOps it’s a mindset, Skillset and it’s not a Department. It’s very common to see in the technology industry Centralized and Unique DevOps/Platform Teams. IMHO we are still not there. Most companies are still catching up on AWS and DevOps Engineering skills such as (Networking, Terraform, Observability, OS/Linux, etc…). So DevOps centralized teams tend to be “Gate Keepers” for good practices and hygiene and several times cost zealots. That’s is fine and has value, dont get me wrong. However several times unique and centralized teams are just a pure form of a bottleneck. AWS has great infrastructure and tons of purpose-built services, I would argue most of the time you dont need to build anything in front of it. Using AWS directly could be coupling but as long as you use open APIs you should be fine. Several years ago I use to be bottered by coupling solution with ORACLE and with AWS the coupling is much bigger, not sure if any company would ever leave AWS, most of the companies I know never left ORACLE :-). Having said that, In order to have benefits and maximize value, you need to use the services you have available and don’t build poor abstractions in front of existing prime solutions. Kubernetes is promising but is not quite there when we consider all the infrastructure components out there. Plus if you do Kubernetes all the things, I’m totally sure bad things will happen.
Multiple Teams to Rescue
Single Platform / Devops team could easily be a bottleneck, who can we fix that? Multiple teams are the answer. If you have multiple teams, there are several advantages such as:
* Reduce Communications Blast Radius (Team Topologies) you partition your communications.
* Purpose Built Solutions and better services via specialized teams
* More parallelism but with a hidden orchestration cost
* Easier to experiment and move fast
Comunication is complicated with the Single Centralized DevOps/Platform team since all tickets literally go to the same queue. When you have limited resources and a huge queue you end up doing the following:
* Standardize in general solutions (i.g EC2 for all the things).
* You say NO a lot since you dont have the skills neither the time to handle the demand.
* You might be blocking Innovation and being a bottleneck for other teams.
Purpose Built solutions can be archived but simply having more teams, which will allow you to rapidly experiment with new use cases and debottleneck your system. Purpose-built solutions can be better handle by different and smaller platform teams since you will partition your backlog and also could rely on contractors and have a bigger pool temporarily.
More Parallelism: As you have more teams, you are capable to execute more projects at the same time. However, this creates a prioritization / Orchestration issue. Since what teams you should have? What happens if one team depends on another team? The solution could lie in other principles such as:
* Self-Service Model: IF you have services(UIs, Jenkins Jobs, Inner Sourcing model) where anyone can open PRs or use a Jenkins job to create what they need this debottleneck lots of the operations and reduce tickets(manual work).
* Isolation and Autonomy: Another thing you can do is to allow teams to perform their own decisions on the fashion of You Build it You Run it (Amazon/Netflix philosophy). Meaning you dont depend on other teams because you take care of all aspects of the solution given a particular domain problem.
Experimentation it’s a nice outcome of all this approach. Most companies don’t understand the cloud and don’t realize cloud == software and we have a huge engineering pool. The issue is Engineers and sometimes even Engineering managers are sensible with their roadmaps and deadlines and end up being afraid of taking the step forward but it’s not as hard as it looks like. Experimentation means different things such as:
* Experiment embracing the DevOps Model and have engineers coding on Terraform.
* Experiment with new delivery models such as More teams, Self-Service, and Isolation.
* Experiment with new technologies, new components and make things move faster.
Experimentation has limits
You can experiment and easily introduce solutions to new Services or new applications. However often the heavy lifting is tackling technical debt and migration of old applications. The inertia in old systems at scale is a hard problem to fix. Experimentation would not save you from that Unless you find ways to have backward compatible APIs/data solutions.
Unfortunately, not all solutions should be Backward Compatible since you could be Backward Compatible with Technical Debts and Design Limitations. Sometimes what you need a big breaking change. The hard part is, depending on how big your solutions are, you might not be able to apply that logic for all your software so prioritization will be needed.
Engineering it’s hard and it will always be hard since you have new systems, old systems, technology changes, people coming and going, deadlines, security, and many many other challenges. Team Structure and layout are critical for better or worse evolution.
Originally published at http://diego-pacheco.blogspot.com on October 28, 2020.