In the paper, almost all companies have an SDLC(Software Development Lifecycle), which nowadays is a form of Scrum and Safe. I know Governance is often a bad word and is attached to bad practices and dated methods. However, at Scale, software needs to be managed. Complexity needs to be managed down. Lifecycle matters because it allows you to execute the right strategy and take the most out of your software assets. When you have hundreds of tousands of services, things get out of control. It takes a lot of work to keep track of what’s going on. I’m a big fan of explicit comunication about the state of affairs. For instance, Netflix has the OSSMETADATA file in the github telling the lifecycle moment of the service/lib. This might sound silly, but it is actually pretty powerful because you can communicate the company’s intentions toward that piece of software and execute the right strategy. I believe that having a clear map of your services is a great tool for prioritization and leveraging the future initiatives that your company might have.
Deliberation on Reduced Scope
At scale, it’s really important to find ways to reduce scope. You just can’t fix it all at once, all the time. There must be reasonable criteria to reduce the scope of projects and actions. However, If you need to know the state of affairs, how can you reduce the scope? You can’t, and you probably won’t. Lifecycle works well with evaluations via code analysis. They are necessary in order to avoid waste.
How many times did you perform migrations on services or internal shared libs that was dead? This is common because library owners need to ensure the consumers are up-to-date, especially when old software is being decommissioned. Missing a consumer can be fatal sometimes, implying outages, data loss, or even financial loss. Because of that risk, teams tend to migrate everything. No matter if it is used or not.
Proper Lifecycle Classification
Having a proper lifecycle classification helps a lot to understand what’s going on with a specific piece of software and allows you to plan accordingly what to do next. For instance, consider if you had the following piece of metadata in every single github repository:
- Domain: Part of the business domain that your software is part of.
- Owners: List of slack handlers of all answers of a piece of software.
- Type: Type of the resource. Ie: Frontend, Backend Service, Customer Facing API, Serverless Function, Batch Job, etc…
- Lifecycle status: Where the software is in tour SDLC. Ie: Active, Mantainence-Mode, To-Be-Decomissioned, Decommissioned.
- Link to Other resources: Links to useful components related to this software. ie: Jenkins Jobs to release software, Prometheus Dashboards for Observability, Pager Duty or Ops-Genie Alers, AWS links like S3 Buckets, Lambdas, SQS/SNS queues, Kafka/Kinesis topics, etc…
If you have all this information in hand is much easier to plan the future steps in regard to this asset for components that are decommissioned or about to be decommissioned.:
- Avoid doing feature PRs
- Avoid non-essential migrations
- Avoid Big Refactorings
Know the domain and if the component is active, allow other possibilities like:
- Conscient re-use (within the same domain boundaries)
- Deliberation of Split/Merge with other components
- Feature PRs and Improvements, knowing the software will not die soon
- Migrations with higher benefits and impact
This is very important because it’s common to have migrations happening all the time and cross-domain features requiring collaboration with multiple teams. PRs need to be made across a bunch of different teams. Lifecycle metadata allows you to better make sense of your services and make more assertive decisions.
Originally published at http://diego-pacheco.blogspot.com on December 31, 2022.