Planning, designing, and running automated chaos experiments

Chaos engineering is all about exploring and overcoming weaknesses in your system. Although many think that chaos engineering must be applied to production, you can use chaos engineer methods to find weaknesses in the entire sociotechnical system of development and delivery.

Course details
  • Instructor-led
  • Online only
Duration 3 hours
Private sessions
  • Online only
Duration 3 hours

Overview

As your systems change and evolve, chaos engineering naturally becomes a continuous practice, just like continuous integration and continuous delivery—continuous chaos, if you will. The problem is that continuous, manual game days are too expensive and time consuming. Enter automated chaos experiments and tests.

Using practical examples and hands-on exercises, we will take you beyond manual game days to demonstrate how to construct automated chaos experiments to continuously and collaboratively explore, surface, and overcome weaknesses in your infrastructure, platforms, and applications. By the time you're through, you'll be able to explain the value of chaos engineering to your company and get started with continuous, automated chaos tests to key an eye on current weaknesses in your system and potentially surface new weaknesses in the future.

What you will learn

By the end of this live online course, you will understand:

  • Why you can't prove system reliability in advance
  • The purpose and limitations of chaos engineering
  • How to explain the value of chaos engineering to your company
  • How to construct careful chaos experiments applied in production to avoid affecting the customer experience

And you will be able to:

  • Design, implement, execute, and share carefully automated chaos engineering experiments to surface technical system weaknesses at the infrastructure, platform, and application levels
  • Communicate and share the findings from automated chaos experiments to enable prioritized system improvement
  • Use chaos experiments as continuous, automated chaos tests to ensure weaknesses do not regress and potentially surface new weaknesses in the future

Audience

  • You're a software developer who needs to start taking responsibility for your code in production.
  • You're a site reliability engineer (SRE) with a little experience managing production, and you want to be proactive about finding system weaknesses before your customers do.
  • You're a system administrator who is responsible for the availability of production, and you need a proactive technique for surfacing system weaknesses before your customers experience them.
  • You're a product owner who is responsible for delivering a business-critical product or service, and you want to learn how to gain trust and confidence in your system’s reliability.
  • You're a DevSecOps engineer who needs a technique and tools to support discovering, capturing, sharing, and collaborating on security weaknesses.

Prerequisites

  • A general understanding of Kubernetes as a platform and Java.
  • A GitHub account
  • A machine with the Chaos Toolkit installed

Recommended preparation

Upcoming sessions

There are currently no upcoming sessions, but you can contact us to book a private session.