So, you’ve run your test suite three times; the first time all passes, the second fails, the third all passes again. Yet, you’ve changed nothing in the interim.
Welcome to the realm of flaky tests, where you’re significantly more likely to find yourself if your product sends emails or SMS than anyone without these features.
What is a flaky test?
Simply put, flaky tests sometimes pass, sometimes fail, with no input in between to cause the failure. It’s not because your product is broken, it’s because the test itself is unreliable.
Why do flaky tests happen?
There’s a range of reasons for flaky tests, but some of the most common include:
| Third-party dependencies | Email and SMS providers exist outside of your control, meaning you can’t resolve outages, rate limiting or network latency that can impact your tests. |
| Asynchronicity | Messages aren’t delivered instantly, meaning tests that check for delivery too quickly will pass or fail sporadically depending on service speed. |
| Data and state management | Database conflicts can occur when test data is poorly managed, for example if different tests are using the same email address, it causes collision. |
| Hardcoded wait times | Using hardcoded wait times to send messages at intervals is a common cause of flakiness because network speeds can vary between days and times. |
| Imprecise filtering | If your system sends multiple different emails, e.g. verifications, order confirmations, and magic links, and your test says, “give me the latest email”, inevitably your test will grab the wrong one at one time or another. |
What impacts do flaky tests have on real teams?
Flaky tests can waste a lot of time, especially when they encourage your team to re-run pipelines rather than fix issues in the hope that everything will simply pass next time. It’s a precarious situation to be in when your team second-guesses or ignores bugs, assuming they’re caused by a flaky test, not least because it increases likelihood that the bugs will reach production.
Equally, tests which fail may lead your team to waste time investigating non-existent issues, causing release delays. These issues also drive-up maintenance costs, as well as impacting team morale because of the frustration of an unreliable test infrastructure.
How to detect flaky tests
Detecting flaky tests requires a mix of consistent checks and, where possible, tools to help. Some methods you can adopt include:
- Clearly defining ‘flaky’: ensure your team all agrees on what constitutes a flaky test, for example if the message doesn’t arrive within the expected timeframe, message content differs unexpectedly, or false negatives.
- Implementing CI tools: CI tools such as GitHub Actions and Jenkins can detect flaky tests, using things like retry logic (where failed tests rerun automatically, and then flag inconsistencies in results), and then flag them for your attention. There’s also a choice of plugins out there which are designed to monitor instability.
- Analysing your test history: check your test logs for issues such as delivery delays, rate limiting and timeouts, in line with your established definitions of flaky tests. It’s best practice to check your test logs regularly, for example you might do this daily for high-volume pipelines, or weekly for standard QA. This means exporting the test run, filtering for your criteria, and tagging suspected flaky tests. This leaves you with a list to fix, flag or rewrite.
Then it’s time to evaluate the overall frequency of flaky tests, as well as the impact they can have. Automation metrics will allow you to see the pass rate of your test suite, with some publications suggesting that a failure rate of even 0.5% shouldn’t be tolerated.
How to prevent flaky tests
Prevention is key when it comes to flaky tests, because when you set a precedent for how tests should be configured, it mitigates the risk of them being flaky as much as is possible.
- More flaky tests lead to fewer flaky tests – developers often find that flaky tests show up most in CI environments, so by using a CI server from the start of the project, you can set up a branch to find flaky tests, and schedule a build on it frequently. It may sound counter intuitive to aim to see more flaky tests in order to prevent them, but actually, more data will allow your team to spot patterns and avoid them in the future.
- Use ‘wait’ commands instead of ‘sleep’ – this means that tests will be paused until certain conditions are met, preventing your tests failing because your sleep commands were too quick.
- Create robust tests – equip your tests with timeouts, retries or waits, which will leave less chance that handling network or performance issues will impact your tests.
- Maintain your test suite – keeping in-depth logs, removing duplicate tests, and establishing a continuous monitoring system will stand you in good stead when it comes to preventing flaky tests because they will help you spot patterns, reduce noise, and help you flag issues before they escalate.
Overall, while detecting and preventing flaky tests is challenging, it’s also crucial to maintaining your test suite, optimising your release times, and keeping your team motivated.
How Mailosaur can help
Mailosaur empowers your team to expand your testing; eradicating flaky tests by providing virtual inboxes and SMS numbers that integrate directly into your existing stack. With the ability to test your flows end-to-end (including crucial workflows like muti-factor authentication, which span across both email and SMS), and offering robust assertion capabilities, Mailosaur makes it easy to optimise your test suite and get results you can depend on.
