"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." -- Brian Kernighan
Most of the code we write is imperfect. While we should strive to remove as many errors as possible, there will always be few that escape and reach end users. L2 support engineer's job is to diagnose these errors, provide a fix and take preventive actions. This article should equip you with information about all three stages.
You will often not know about the feature you're trying to debug, in such cases first consult product documentation and understand the feature you're trying to debug. This will take some upfront time but make the whole process much simpler.
Finding the error
Frappe + ERPNext codebase is 1 million+ lines. You need to be smart about finding source of errors. There are multiple ways you will receive an error report. These are recommended strategies to deal with each type of errors.
1. Error Tracebacks
These are most convenient error reports. You get which document caused the error, request data, full call stack and final error message. What more could you ask for? Best strategy for such errors is to read the traceback and "think your way out of it". What input would have caused this combination of call stack and error? Start by reading code from last called function in call stack. Once you have some idea about it you can reproduce it.
2. Textual error reports and screenshots
If user has shared some description of error they are facing the best course of action is to reproduce this behaviour yourself and see if you face the error.
If you do face the error, now you'll likely have a traceback and you can follow strategy described in #1 above.
If all you have is screenshots or some error message, then do project wide text search to find where such error message is thrown out. Remember that error messages can be dynamically generated, so you will have to use regular expressions or partial messages to find where such message lives.
Don't feel hesitated in asking end-user to provide more information and steps for reproducing error. It will speed up the whole process.
E.g. Try to find where this error is coming from:
ERROR: Last Stock Transaction for item new-item under warehouse Mumbai - WH was on 21-02-2012. You are not authorized to make/edit Stock Transactions for Item new-item under warehouse Mumbai - WH before this time.
What will you search for?
I'd go for "Last Stock Transaction for" or a case insensitive regular expression Last Stock Transaction for.*under warehouse
3. "Business Logic" errors
These are errors with the way some business logic is defined. E.g. value mismatch between a document and created accounting entries. To debug this, you will have to dive deep and understand how the "business logic" is defined. These are usually the most difficult bugs to resolve, so feel free ask help from your peers or subject matter experts.
The product documentation is most important resource in debugging business logic errors. You should ideally first read the docs, the code surrounding the business logic and if you still don't make any progress it's best to discuss with a peer.
General tips to debug a problem
- Read Frappe Framework debugging page.
print()andconsole.log()are two easiest way to examine program state at given line. Use them sparingly to understand what's going on.- If you are logged on to user's site, you can use JavaScript console to get hidden attribute of docs using
cur_frm.docapi which gives you object identical toget_doc. - If you've narrowed down functions or lines of code which might be problematic then put a breakpoint there.
- Adding a line with
breakpoint()in python ordebuggerin JavaScript. to hault execution and manually debug the issue. - If you are not familiar with debuggers it's a good idea to spend 1-2 hour learning
pdb(python debugger) and Chrome/Firefox debugger tools. The investment will pay off in long run. - If process of reproducing a bug is quite long and you need to multiple iterations to narrow down the problem, just write a unit test for it!
- DocType and Custom JavaScript is accesible through Sources tab in developer tools. This can be used to inspect and debug JS source code.
Making use of "logging" doctypes
When trying to figure out why anomaly happened, knowing what was happening around that time helps. Framework has several logs for this.
- Error Log
- Error snapshot (tracebacks with locals info!)
- Version log - filter these by time to find "what was changing" around the time anomaly happened.
- Route history - who was doing what around the time anomaly happened.
- Scheduled Job Log - which background jobs were running along with failure messages if any.
- Background Jobs page (look at failed jobs)
- Deleted documents list
- Repost Item valuation doctype (specific to stock reposting issues)
Inspecting bin logs (absolute last resort)
Note: On Frappe Cloud just use "Binary Log Browser" report instead of going through the ordeal described below.
MySQL binlog or "Binary Log" are logs that contain ALL write operations (internally these are used for replication of DB too). These can tell you about EVERY write that happened on server, this also means it's ridiculously large... often in GBs when decompressed. This requires root access to server, so this should absolutely be the last resort. Also note that only last ~10 days of data is available.
- Access server and find bin logs, they are usually stored in
/var/lib/mysql - list the directory content and find the one that has modified data after the anomaly. Usually these are capped by size so it could contain data for multiple days.
ls -lahusually helps in finding the right file. - Find site's database name. This is present in site config.
- Copy the file to
/tmpdirectorycp binlog.01 /tmp/, change the file's ownership to Frappe.chown frappe:frappe filname - Logout as root and log back in as normal user.
- Extract bin logs and save them to some temp file. E.g.
mysqlbinlog binlog.01 --database=dbhashname > /tmp/mysqlbinlog.txtbe careful while doing this, don't accidentally write to something else. - This file will still be in GBs (be mindful of server space). You can't open this in editor without it crashing. You'll have to use some shell tool for grepping the right parts. e.g. if you're looking for something related to an item, grep for the item code and all queries related to that item code will be available. You can output this in separate file for reviewing. E.g.
grep 'unique_identifier' binlog.txt > filtered_binlog.txt. Open this in Vim or download it for further analysis. While grepping you can also add line numbers with-nflag and extra context with-Cflag. - Good luck, you will need it!
Fixing the error
Once you've identified a bug it's time to prepare and send a fix. Follow git branching related guidelines.
- Try to ensure that your change doesn't introduce some new error. At least run unit test for modules you're modifying before submitting PR. Entire test suite will run in CI.
- Does it affect existing users and can existing records be corrected? Write a patch.
Preventive Action
After fixing a bug you ask these questions:
- Are there similar bugs yet to be discovered? Can you search project and fix them too?
- Is there any dependent code that needs to be fixed?
- Can you write a regression test to prevent recurrence of such error?
- Is this a common mistake? Write a Semgrep rule to find all existing instances and to catch them in CI.
- Was the bug way too difficult to trace? Make it easier!
Conclusion
Debugging is _creative_ process, no amount of tips will help you tame all the bugs. You'll develop the required skills if you are persistant at solving them. Good luck on the journey!
Fun fact: Do your know origin of the term "debugging"?
