Injecting Chaos in AWS EC2 with Auto-heal using OpsworksMore Tutorials
September 20th, 2019
The AWS OpsWorks is a configuration management service that helps you configure and operate applications in a cloud enterprise by using Puppet or Chef. The Getting Started Guide gets you up and running with simple single node service with auto-healing. ChaosIQ experiments are run through Chaos Toolkit. This tutorial takes you through getting an environment using OpsWorks and running chaos experiment against it to show the auto-healing capability.
Getting Started with AWS OpsWorks
To get up and running quickly with a single instance service using AWS and OpsWorks there is great walkthrough that you can follow Getting Started with Linux Stacks - this will give you an EC2 instance running a simple Node service. I deviated from the walkthrough in that I made my instance a T2 micro leaving me within the free tier for AWS. Having completed the walkthrough my OpWorks console ended up with as layers view as follows:
Note the layer is configured to Auto-heal. This means that is my ec2 instance dies for some reason it will be restarted by the aws infrastructure within 5 minutes.
The walkthrough also left me with a service I could reach at a URL and external ip address, that took me to a page as shown below:
Chaos Toolkit, Open Chaos and the Catalog
Having got my auto-healing service up and running, I thought I should verify the auto-healing is working as expected by injecting so turbulent behaviour into my environment. This is where the Open source Chaos Toolkit, the Chaos toolkit is extended by adding extensions and there is already a large number of extensions on github, many developed by the Chaos toolkit community. Fortunately, there is already an extensive AWS extension that allows me to terminate an EC2 instance by ID. So given this, I was able to develop an experiment that killed off my ec2 instance and check that it restarted.
Open Chaos Catalog
As part of the Open Chaos Initiative we are developing and contributing to the chaos catalog in github within the catalog I have added the experiment to kill of the EC2 instance and wait for the auto-heal to complete. The experiment README, contains the full details of how to run the experiment aginst the OpsWork environment, this in a nutshell, with the native chaos` command, can be as simple as:
(chaostk) export EC2_INSTANCE_ID=my_ec2_instance_id; \ export AVAILABILITY_ZONE=us-west-2; \ export AWS_REGION=us-west-2; \ export AWS_ACCESS_KEY_ID=ABCD*****; \ export AWS_SECRET_ACCESS_KEY=ABCD**********xx; \ chaos run https://raw.githubusercontent.com/open-chaos/experiment-catalog/master/aws/OpsWorks/ow_ec2_auto_heal_single_instance_service.json
This will run a locally installed chaos toolkit, with the experiment defined withing the git hub catalogue. Details of the environment variables and their meaning can be seen README
The above is great when I want to run my experiments in isolation and I have full control over my environment, but what if I am part of a team or working in an enterprise, if I randomly kill off my production services at will it could raise a few eyebrows and possibly have a negative impact on my career progression. This is where the ChaosIQ console comes in. As a user of the console I can view and control my experiments from the console, but more importantly, this can provide visibility and control to my team. My dashboard view can be seen below:
The console environment allows me to work with other members of my team so I can share experiments and the results of executions, I can also setup Safeguards these safeguard's for example, can protect me from:
- running my experiment at a bad time operationally
- running my experiment that clashes with other team members experiments
- allows me to setup a policy to stop all experiments now (big red button)
We are currently inviting people to join ChaosIQ early access program and if you feel the features of the ChaosIQ could benefit your or your team or we can assist in anyway on your Journey to Chaos Engineering, then please see our Early Access Request Page
Chaos Engineering Resources
- Learning Chaos Engineering takes you through all the steps required to get you on the journey to Chaos
- Chaos Engineering Observability how to bring your chaos experiments into the world of system observability
- This week in Chaos Newsletter keep up to date with current events and stories in Chaos Engineering
- ChaosToolkit open source chaos toolkit to build and run your own experiments
- Open Chaos Initiative open community to embrace free and open standards to enable everyone to share, collaborate on and learn from chaos engineering.
- Open Chaos Github open chaos resources on github
- Chaos Experiment Catalog Chaos Experiment Catalog including experiments across a number of different platforms and services such as Kubernetes, Google cloud , AWS and Azure