Injecting Chaos in AWS EC2 with Auto-heal using Opsworks

More Tutorials
Injecting Chaos in AWS EC2 with Auto-heal using Opsworks

The AWS OpsWorks is a configuration management service that helps you configure and operate applications in a cloud enterprise by using Puppet or Chef. The Getting Started Guide gets you up and running with simple single node service with auto-healing. ChaosIQ experiments are run through Chaos Toolkit. This tutorial takes you through getting an environment using OpsWorks and running chaos experiment against it to show the auto-healing capability.

Getting Started with AWS OpsWorks

To get up and running quickly with a single instance service using AWS and OpsWorks there is great walkthrough that you can follow Getting Started with Linux Stacks - this will give you an EC2 instance running a simple Node service. I deviated from the walkthrough in that I made my instance a T2 micro leaving me within the free tier for AWS. Having completed the walkthrough my OpWorks console ended up with as layers view as follows:

Ops Works Layers

Note the layer is configured to Auto-heal. This means that is my ec2 instance dies for some reason it will be restarted by the aws infrastructure within 5 minutes.

The walkthrough also left me with a service I could reach at a URL and external ip address, that took me to a page as shown below:

Ops Works Layers

Chaos Toolkit, Open Chaos and the Catalog

Having got my auto-healing service up and running, I thought I should verify the auto-healing is working as expected by injecting so turbulent behaviour into my environment. This is where the Open source Chaos Toolkit, the Chaos toolkit is extended by adding extensions and there is already a large number of extensions on github, many developed by the Chaos toolkit community. Fortunately, there is already an extensive AWS extension that allows me to terminate an EC2 instance by ID. So given this, I was able to develop an experiment that killed off my ec2 instance and check that it restarted.

Open Chaos Catalog

As part of the Open Chaos Initiative we are developing and contributing to the chaos catalog in github within the catalog I have added the experiment to kill of the EC2 instance and wait for the auto-heal to complete. The experiment README, contains the full details of how to run the experiment aginst the OpsWork environment, this in a nutshell, with the native chaos` command, can be as simple as:

(chaostk) export EC2_INSTANCE_ID=my_ec2_instance_id; \
          export AVAILABILITY_ZONE=us-west-2; \
          export AWS_REGION=us-west-2; \
          export AWS_ACCESS_KEY_ID=ABCD*****; \
          export AWS_SECRET_ACCESS_KEY=ABCD**********xx; \
          chaos run https://raw.githubusercontent.com/open-chaos/experiment-catalog/master/aws/OpsWorks/ow_ec2_auto_heal_single_instance_service.json

This will run a locally installed chaos toolkit, with the experiment defined withing the git hub catalogue. Details of the environment variables and their meaning can be seen README

Chaos Console

The above is great when I want to run my experiments in isolation and I have full control over my environment, but what if I am part of a team or working in an enterprise, if I randomly kill off my production services at will it could raise a few eyebrows and possibly have a negative impact on my career progression. This is where the ChaosIQ console comes in. As a user of the console I can view and control my experiments from the console, but more importantly, this can provide visibility and control to my team. My dashboard view can be seen below:

Chaos Console Dashboard

The console environment allows me to work with other members of my team so I can share experiments and the results of executions, I can also setup Safeguards these safeguard's for example, can protect me from:

  • running my experiment at a bad time operationally
  • running my experiment that clashes with other team members experiments
  • allows me to setup a policy to stop all experiments now (big red button)

We are currently inviting people to join ChaosIQ early access program and if you feel the features of the ChaosIQ could benefit your or your team or we can assist in anyway on your Journey to Chaos Engineering, then please see our Early Access Request Page

Chaos Engineering Resources