Planning, designing, and running automated chaos experiments
Chaos engineering is all about exploring and overcoming weaknesses in your system. Although many think that chaos engineering must be applied to production, you can use chaos engineer methods to find weaknesses in the entire sociotechnical system of development and delivery.
As your systems change and evolve, chaos engineering naturally becomes a continuous practice, just like continuous integration and continuous delivery—continuous chaos, if you will. The problem is that continuous, manual game days are too expensive and time consuming. Enter automated chaos experiments and tests.
Using practical examples and hands-on exercises, we will take you beyond manual game days to demonstrate how to construct automated chaos experiments to continuously and collaboratively explore, surface, and overcome weaknesses in your infrastructure, platforms, and applications. By the time you're through, you'll be able to explain the value of chaos engineering to your company and get started with continuous, automated chaos tests to key an eye on current weaknesses in your system and potentially surface new weaknesses in the future.
What you will learn
By the end of this live online course, you will understand:
- Why you can't prove system reliability in advance
- The purpose and limitations of chaos engineering
- How to explain the value of chaos engineering to your company
- How to construct careful chaos experiments applied in production to avoid affecting the customer experience
And you will be able to:
- Design, implement, execute, and share carefully automated chaos engineering experiments to surface technical system weaknesses at the infrastructure, platform, and application levels
- Communicate and share the findings from automated chaos experiments to enable prioritized system improvement
- Use chaos experiments as continuous, automated chaos tests to ensure weaknesses do not regress and potentially surface new weaknesses in the future
- You're a software developer who needs to start taking responsibility for your code in production.
- You're a site reliability engineer (SRE) with a little experience managing production, and you want to be proactive about finding system weaknesses before your customers do.
- You're a system administrator who is responsible for the availability of production, and you need a proactive technique for surfacing system weaknesses before your customers experience them.
- You're a product owner who is responsible for delivering a business-critical product or service, and you want to learn how to gain trust and confidence in your system’s reliability.
- You're a DevSecOps engineer who needs a technique and tools to support discovering, capturing, sharing, and collaborating on security weaknesses.
- A general understanding of Kubernetes as a platform and Java.
- A GitHub account
- A machine with the Chaos Toolkit installed
- Principles of chaos engineering (article)
- Chaos Engineering (book, 65 pages)
- Why We Need More Chaos (video, 13 min)
- Harnessing Chaos (video, 48 min)
- Chaos Engineering (video, 47 min)
- Getting started with the Chaos Toolkit (tutorial)
- Site Reliability Engineering (book)
There are currently no upcoming sessions, but you can contact us to book a private session.