Preface
ERPNext is almost 20 years old. It is a complex piece of business software with extreme customizability. As such we have written a lot of tests to prevent regression. Overtime, reliability of tests themselves have degraded - succeeds on local but fails on CI or the reverse. This lack of locally deterministic test became a bottleneck for engineers. So we decided to fix them.
Problem - Test Runner
Test Runner is ridiculously overengineered. Loading test records by recursively traversing link fields. Handling legacy setup mechanisms which are more than 15 years old. Not having well defined and commit/rollback mechanisms.
- Implicit Behavior - Even though you have flags like --skip-before-tests and --skip-test-records, some implicit setup behavior cannot be skipped. You have no control on which
test_records.jsonwill be loaded first or in what order. You have no control on how it deduplicates individual test data. You have no control on when it does a commit and or a rollback. Everthing is implicit. - No Locality - While writing/debugging test there was this need to jump across apps to understand how the test runner or utilities work. So, if you are writing test on HRMS and using some setup method in ERPNext, you'll have to look at both ERPNext and Framework code. Nothing saps your working memory as context-switching.
Put all of this together and you have black box that nobody fully understood.
Ex:

- Passing dependency requirement to test runner with no control on the order or deduplication.
- Setting some global state through tearDown()
A Better Test Suite
Framework's test runner should not be expected to do the heavy lifting of test setup for every possible type of frappe app. It should only initialize connection to database. Individual apps should do necessary prerequesite on its own. With this expectation, we started the refactor.
- Locality & Explicit - We wanted more precise control on test data setup. So, everything was made explicit - execution flow, data loading sequence, deduplication. We introduced a new bootstrap class (
BootStrapTestData) to setup all master data within ERPNext (Locality). We pass explicit key to deduplicate. All of the setup is done at the python level. This gives control on what is created and in what order. Bootstrap is designed to be a singleton and to create data in persistent manner (commit). This combination gives us an efficient and stable master data setup.

- Purge - Commit/rollback inside test case, any framework API that does implicit commit -
frappe.db.truncate, incorrect use offrappe.flagsall such flakiness inducing behaviour was removed. - Well Defined DB Transaction - Next would be to establish a structure with predictable commit and rollback behavior. Frappe uses unittest testing framework, which has entry points for each level OOP hierarchy. We define a base class (
ERPNextTestSuite) that does rollback after each test case. Then we enforce it's usage on all test classes.
This allowed us to systematically remove flakiness from entire test suite of ERPNext and HRMS.
After refactor.
All master setup, deduplication and DB transaction are explicitly handled by base class ERPNextTestSuite. Simply inherit the class and write tests!
Anecdote
Rollback after each test isn't something new, webnotes did it almost 15 years ago. We've come full circle.




