Resources

Measuring Defenses and Controls Through Chaos

Written by: Alex Martirosyan, OSCP, GPEN

It may feel daunting to engage with an offensive security team that will be performing a live-fire exercise against your production environment. This is true for any scenario, whether it be testing technical controls internally, externally, or even testing your employees. Critical business operations rely on these dependencies to succeed. If they fail that can mean lost investments, relationships, and money.

As security testers, we hold this trust at the highest level of privilege and realize that there may be a degree of uncertainty going into the overall planning of these assessments. We commonly hear phrases from clients such as, “we are not ready, that system is just being implemented – come back when it is hardened,” or “we are still in the process of implementing this great tool, let’s do the testing then instead.”

This begs the question, when is the optimal time to test an environment? We know that new threats are continuously emerging, and there is no end state in security, so why do we treat many offensive assessments the same way?

This is where we can take a lesson from chaos testing, a concept formalized by Netflix that is employed to test the resiliency of information technology (IT) infrastructure. In short, the end goal is to continuously experiment on production systems as they are being modified to build confidence in their redundancy and quality of service. Imagine a monkey was in the server room pulling cables out at random or shutting down arbitrary systems – Chaos Monkey is an open-source tool developed by Netflix to employ this thought experiment against production environments.

This chaos can be embraced internally from a development perspective, and several top engineering teams have employed these techniques successfully, including Facebook, AWS, and Microsoft. It is critical to fully embrace failures to learn how systems will operate under realistic scenarios. Finding gaps during early stages of a life cycle through repeatable and thoughtfully planned out exercises provides valuable experience to the business. However, the real world is chaotic and unpredictable, and testing with chaos in mind also helps prepare for the worst outcomes.

This is not to say that the end goal of a security assessment is to break things or deny service. However, any tester that tells you there is no possible risk of downtime or impeding operations is either lying or not doing an adequate job of testing your environment. As with all risk, it can be well mitigated through proper planning and communication.

This is one of the reasons our team joins pre-sales calls, scope meetings, and provides daily status updates all throughout the life cycle of an exercise. We also rigorously test any technique, tool, or attack in lab environments to fully understand potential impacts. Ensuring management is aligned with the overall testing methodology of your designated security team leads to successful engagements.

Chaos Security Testing Principles:

  • Planning and over-communication helps build successful engagements
  • Measure controls continuously, validate any assumptions, and remediate as necessary
  • There is rarely a “perfect” or “best” time to measure defenses, it must be done continuously
  • Define core objectives and desirable outcomes, and ensure security teams produce actionable results

Chaos testing helps identify problems quickly and early on when new systems are being designed. In offensive security testing this is also realistically one of our end goals. We want to build testable, repeatable, and accurate processes to examine attack life cycles or paths. This way we can fully measure defenses to proactively identify gaps before a threat actor does. We can then re-test remediation efforts and continuously examine new threat scenarios. If you ever find yourself questioning when to performance test, consider implementing some chaos.