How AWS SDM Dive Deep (4) - Deployment

The Continuous Integration and Continuous Deployment (CI/CD) movement might be the best thing happened in software industry in the last two decades - it changed our mental model about how a code change becomes value to customers. In the old days code change may take months or years to get to customers; nowadays it is days, hours or even minutes. The feedback loop make it possible to iterate fast, fail fast, collect feedbacks, learn the lesson and try it again. But applying CI/CD concept to AWS services can be tremendously complicated - with 30+ regions, 96 Availability Zones to cover, ten of thousands or even millions of hosts involved, no software deployment is trivial at this scale. An AWS SDM needs to deeply appreciate how a code change in their service gets deployed to all their supported regions. 1. Code itself has no value, only code that is deployed to the product that serves customers has value. A SDM should pay close attention to the process that takes source code, build it, package it and deploy it. This process can also be the No.1 bottleneck of software team’s productivity. 2. CI/CD is a continuous investment. It takes strong discipline to keep the CI/CD pipeline flowing. If the SDM of a service is not the believer and advocate of CI/CD, the process will quickly fall apart, team productivity and morale will suffer. Having integration tests that keep failing is no better than not having integration tests. 3. Deployment is the Quality Assurance (QA) process. Believe or not, most AWS services don't have dedicated QA engineers. AWS services heavily rely on automated unit tests, integration tests, canary tests and runtime metric/alarm/dashboard to keep the high bar on software quality. Code change needs to go through layer of tests before they can be promoted to the next stage of the development pipeline. Alarms during deployment might trigger automatic rollback to revert the code change so the software in production goes back to the last known healthy state. A SDM should be familiar with the state of union of their software quality by observing the deployment process. 4. Deployment is an operation safety hazard by nature. Every change to the exist system is a risk. But avoiding risks by not making changes is not an option. Just like Red’s famous line: “...it comes down to a simple choice, you know? Get busy living, or get busy dying.” A SDM needs to be actively involved in managing the risks of deployments. Deployment is one of those things: it is a pain but it is how you treat the pain matters. You can avoid the pain, do it as little as possible; but it gets bigger on you, pain becomes suffering. Or you become really good at it. You use the pain to make your process better, that is the CI/CD mentality. A SDM in AWS needs to be willing to face the pain right on.

PreviousHow AWS SDM Dive Deep (3) - Infrastructure NextHow AWS SDM Dive Deep (5) - Correction of Error (COE)

Last updated 1 year ago

Was this helpful?