Systems Recipes - Issue #5
Welcome to Issue #5. We hope everyone is keeping safe and sound!
We’d like to dedicate this issue to the theme of testing. A big part of architecting complex systems is being able to assert that your system is correct (for a given definition of correctness). Practices for productive and reliable automated testing are as hotly debated as programming languages and code editors.
If you have read or watched anything interesting lately that you think will be a good fit for future issues or have any feedback, simply hit reply. You can also reach out on Twitter or send an email! 📧
Cindy Sridharan has an excellent 4-part series (well 3 parts for now with a 4th part due towards the end of 2020) on testing microservices. Every system gets beyond the point where you can achieve full end-to-end testing.
We particularly enjoyed the Testing In Production portion. Observability of complex issues only really happens when software is out in the wild. Bugs can occur simply due to different running versions of software interacting in a bad way. At a certain scale, you want to test for resiliency in the face of an unknown class of bug being surfaced.
There’s no dearth of information or best-practices or books about how best to test software. This post, however, focuses solely on testing backend services and not desktop software or safety critical…
Building an Effective Test Pipeline in a Service Oriented World
Airbnb has been moving their monolith application to something more oriented around services. They had a CI job which ran all integration tests which simulated user and guest flows.
Airbnb suffered the typical issues associated with a long running and brittle test suite. These tests were mandatory checks for engineers and that meant that the feedback loop was brittle and error-prone.
Airbnb split their tests into a test pyramid where you had cheap and effective tests at the bottom of the stack such as lint checks and unit tests. As you moved further upwards the pyramid, there were more brittle integration tests run at the stage of when an engineer was ready to deploy to production canary deployments.
Learn about how we built an integration test pipeline for the testing of critical business flows spanning across multiple services in Airbnb. Over the past 2 years, Airbnb engineering has been…
Controlled Chaos with Fault Injection Testing
Most folks think about testing as something that is done strictly prior to release of a piece of software. In the days of pre-packaged software installed by end-users, ideally software was reliable and as bug-free as possible before it got to the customer.
With software running on infrastructure, we can get more creative. This article by Riot Games highlights a field which is not talked about too often, injecting faults and breaking things on purpose to build more reliable systems.
Fault Injection Testing can be used to find subtle bugs and assumptions in software before you actually hit them at an inopportune time. The article has a few examples, one of the first is finding a bug around IP address re-use which their Redis client library was not handling correctly.
In this article, I’ll talk about how we used Fault Injection Testing (FIT) to launch the Riot API inside of rCluster
Going Faster with Continuous Delivery
This article from the Amazon Builders Library talks about how AWS approaches automating the delivery of software, including how they test at each stage to even testing by controlling the rate of rollouts to customers (to reduce blast radius).
How Amazon automates software delivery including automated testing and deployments.