Complex systems require resilience, and tools such as chaos engineering must be used to ensure that resilience. Learn about Azure Chaos Studio.
Cloud-native applications fit well into the client-server or three-tier category rather than the old monolith. These are now a collection of services designed to combine code and platform tools to manage and control errors and scale around the world.
This is great for users. Users get applications that are fast, responsive, and accessible from anywhere, on any device.But that makes it difficult Developer An operations team with a complex service web that is difficult to test on a large scale. You can also design for failures and incorporate redundancy into your system, but this complicates the architecture and adds new servers and additional service instances.
look: Quick Glossary: DevOps (TechRepublic Premium)
Fail and test complex systems
As complexity increases, more testing is required. This can be a problem when testing what happens when a service fails under load. How does a transaction fail if the shopping cart backend needs to switch databases in the middle of a purchase? What will the restaurant delivery tracker do if the main messaging platform goes down?
We need a test model that allows us to inspect the running system before initiating element failures and tracking system behavior. The idea is to inject a small amount of failure into a running system and monitor how the system responds to a set of target conditions.It’s a technology known as Chaos engineering,Pioneer Inside Netflix with its Chaos Monkey Tool Randomly influence operations with the aim of revealing unconsidered failure modes, DevOps The team wasn’t ready.
Intent Chaos engineering The technique is not to investigate how the system fails, but it can be a beneficial side effect. Instead, it aims to show how resilient they are. Netflix needed to always provide a solid customer experience to ensure that users could watch movies and shows no matter what happened in the background.
It’s not surprising that these techniques are being adopted by other platforms, especially hyperscale cloud providers such as: Microsoft Azure.. If your application is running in Azure, you need to make sure that your application continues to run in the event of a Microsoft server failure. Microsoft’s own chaos engineering team regularly investigates the impact of failures on the platform to ensure that the services that the application depends on handle the failures properly.
Build your own chaos
Must-read developer content
But can you use the same technique in your own application to make sure it’s as resilient as the service your code uses? There is no reason not to do so. Microsoft may have its own team of site reliability engineers to keep Azure up and running, but when the code runs on a large scale, it has its own knowledge of both software and the services it uses. SRE is required.
If you are running on a large scale, you need to implement some form of chaos engineering to ensure the restoring force of your application. Microsoft provides guidance Many of the ideas about how to think about using these techniques as part of your Azure documentation are derived from Netflix’s experience. Chaos says it’s a process.
That’s not surprising. Chaos may be considered random, but if you use chaos to test resilience, you should plan to treat it like security. Microsoft’s model is being discussed from the perspective of attackers and defenders. The attacker is one side of the equation, injecting a failure into the system with the goal of destroying it. Defenders, on the other hand, assess the impact of the attack, analyze the outcome and plan mitigation measures.
The test should be treated like a scientific experiment. You should start with a hypothesis such as “The application will continue to work even if you lose a single back-end database instance.” Next, define the injected failure. This will shut down the database of the running application. Finally, you get the results you expect. The application will continue to run. The chaos engineering platform must manage all three steps, start and stop testing, and provide a way to access test results.
One of the important aspects of the chaos test is to remember that the test has a blast radius. It should be noted that they are intentionally destructive and may not work. This means that you can always unplug your test and get back to normal operation as soon as possible. Chaos injection requires a rollback method. If possible, you should use one button to automate the entire process.
Azure DevOps third-party tools show that they are interested in using these techniques as part of testing their application. Proofdock tools It combines the chaos of chaos engineering with the latest development concepts and works with observable tools to provide what is called “continuous validation” and do everything within a familiar portal.
Introducing Azure Chaos Studio
Microsoft is currently previewing a set of chaos engineering tools for Azure applications with selected customers based on its own internal tools. Azure CTO Mark Russinovich Demonstrates at Microsoft’s Spring Virtual Ignite, A combination of the Azure test management portal and a JSON-based test scripting language.
Testing in Azure Chaos Studio has two components: agents running on virtual servers or embedded in code, and direct access to Azure-specific services. These are controlled by the JSON experiment description. For example, test a failover of your application’s Cosmos DB backend by simulating a failure in one of your application’s regions. Alternatively, in your experiment, you can use the agent to shut down a service host on a
server running a node.js application or some .NET code to test the restoring force of your own application.
The experiment consists of a series of steps, each step having an action. Microsoft has developed a domain-specific declarative language for working with application infrastructure. It shares some similarities with the Bisp resource description language. You can build experiments in Visual Studio Code, save them to Azure, and list them in the Chaos Studio portal. From the portal, select an experiment to run using other elements of Azure developer tools, using application monitoring built into your code or Azure-specific service tools, and monitoring the operation of your application. start.
If you’re using another continuous integration / continuous development tool such as Azure DevOps or GitHub Actions, Azure Chaos Studio provides a REST API that you can use as part of a set of integration tests when building a new version of your code. increase. Running Chaos Studio early in the application lifecycle makes sense because you can incorporate restoring force testing into your release process.
As cloud-native development matures, the way applications are built is becoming more and more the way large cloud platforms and services build code. Techniques that were previously only needed within Netflix and Azure are now needed by everyone, and with the advent of Chaos Studio in Azure, what used to be a custom tool is now a platform that everyone can use. It helps a lot to change. Fulfill the promise of a resilient system.