The ultimate goal of microservices development is to achieve daily releases to production. For this to be done correctly, each microservice must go through a series of reliable and fully automated quality gates and undergo rigorous testing.
To achieve reliable results and clearly identify problems before your users find them, automated testing should be very comprehensive and performed at various levels. Below, I’m going to describe how to approach microservices testing in a structured manner, to avoid missing important factors.
Microservices should be tested at 3 distinct levels:
Component or class level - unit testing microservice internals. External dependencies to persistence storage, infrastructure services, and other microservices can be mocked.
Container or microservice level - testing microservices’ integration with external dependencies. If microservices are packaged in containers (e.g. Docker), then the microservice and its dependencies can be instantiated as a group of connected containers. This simplifies the testing process.
System level - testing the entire system. All system components are instantiated, and tests execute transactions that go across the entire system, passing through multiple components along the way.
All tests can be categorized as one of the following 2 groups:
Functional tests - check implementation according to functional specifications. Typically, tests of this type make calls with predefined input parameters and examine the output for correctness.
Non-functional tests - check system capabilities, such as performance, capacity, scalability, reliability, and security. These tests usually make multiple calls, collect metrics, and compare them to preset targets.
Functional tests can be subdivided into:
Smoke tests - quick tests that execute only “happy paths” of critical business transactions. Smoke tests do not deliver reliable results. They can only determine whether or not the tested component or system is completely broken. Since these tests are quick, teams usually run them first to check for serious errors. This stops the testing process early, which saves time and resources that would be otherwise needed for more comprehensive testing.
Positive tests - performed by calling a component or system in a valid state with a correct set of parameters. Checks “happy path” executions.
Negative tests - performed by calling a component or system in an invalid state and/or with an incorrect set of parameters. Checks for clear error responses.
Boundary tests - performed by calling a component or system with boundary values. Checks that the logic is capable of correctly handling such values.
Non-functional tests, on the other hand, exist in a large variety. Here are a few that are often found in microservices development:
Performance/load tests - measure the execution time of individual transactions, while a component or system is running under realistic load. Tests are considered to be passed when collected measures are below predefined performance targets.
Stress tests - tests a system or component under an increasing volume of transactions to define a breaking point. If scalability requirements are not met, additional resources can be added to the system and the tests can then be repeated.
Reliability tests - the test purposefully fails individual components of the system and checks if the system is able to successfully recover. Data integrity must also be checked, to make sure that nothing is corrupted after the failure. Netflix is known for its “chaos monkey” technique that randomly fails microservices in production to ensure high reliability of the entire system.
Having defined these concepts, we can now put them together and define what types of tests should be implemented at each quality gate.
Functional: positive, negative, boundary
Non-functional: performance benchmarks for code optimization
System-level non-functional testing may require a lot of expensive resources and may run for a long time. So, instead of using the Stage environment, some teams prefer to run those tests on a separate Performance environment. This environment is created just before the test starts and shuts down right after the test’s completion.
As last note: since system-wide functional tests can be ran in the Production environment, they should be implemented in a special way, to avoid polluting production data. There are a couple of ways to achieve this:
Run all tests under a specially created test account that is excluded from the production results
Create some test data, run the tests, and then clean up the created test data right after test execution.