How MySQL's BinLogs saved the day

Enable mysql binary log. It might save someone's data and also, your reputation.

 · 4 min read

The takeaway:

Enable mysql binary log. It might save someone's data and also, your reputation.

On the 4th of May 2012, we were attempting to restore a few records for a customer. The basic strategy was to restore a mysql dump into a temporary database and copy those records to the production database.

There was a lot of pressure. It did not help that the customer was annoyed. Not an ideal setting when you are trying to restore something from a database. So, while loading the backup, instead of specifying a temporary database name, I typed in the customer's database! and pressed Enter. It took me about 5 seconds to realize my mistake. I hit Ctrl+C and stopped the process.

Sh**!

To assess the damage, I did a cursory check of the customer's database and thankfully, it seemed I had hit Ctrl+C just in time. Disaster averted. Phew!

After about two weeks, the customer informed us that some of the records in the system were missing. By this time, we had forgotten our misadventure. So we started investigating this issue by searching in the bin-logs. We searched for any delete queries which could've been the cause of the missing records, but we couldn't find any such entry for the customer's database. That was when my mistake came to haunt me.

We checked for drop table queries and we found about 20 such queries being executed on the 4th of May, 2012. Basically, in the 5 seconds before I hit Ctrl+C, data of 20 tables got replaced with the old data from the backup.

In essence, the database now contained entries prior to 30th April, 2012 in those 20 tables and the rest of the database contained all the entries, i.e. the database was partially damaged.

Backups come in handy when you want to restore some records, tables or the whole database. It was useless in our situation, where part of the data was old, part of it was missing and the rest of it was perfectly intact. It would be a monstrous task to sift through backups and identify the ones which were missing or the ones which got replaced with their older versions.

After a lot of discussions, we hit on the idea of using bin-logs to restore the database. bin-logs or binary logs sequentially record DML and DDL statements of all committed transactions, like a tape recorder. We were in for a long day and I was determined to clean up my mess.

The plan was to restore the 30th April backup in a temporary database, apply the statements from the sql file, which we extracted from the bin-logs. Then after 6 PM India Time, when the customer was to stop using the system, apply the remaining latest statements recorded in the bin-logs and then replace this temporary database as the customer's database.

  1. The first step was to recognize the query statement in the bin log, which occurred before the 4th of May and after the timestamp of the 30th April backup. We used this timestamp to locate the statement which was executed after the backup was taken. This step took about an hour.

  2. We then extracted all the statements after and including the located statement, from the bin-logs and redirected the output to a .sql file. The sql file had about three weeks worth of sql statements.

  3. The next step was to remove the statements which caused the damage. Only one problem. The size of the file was 1.4 GB, and contained about 43.7 million lines. The process of removing those sql statements, including any statements containing "use database", was extremely slow. We were using vim over an ssh connection. One of the cpu cores of our server kept maxing out. Each time we saved the file, we had to wait about 15-20 minutes for the process to complete. Every search operation on the file took about 5 minutes. It required a lot of patience.

  4. After scrubbing the sql file, applying this file on the backup took several iterations. This was mainly due to some internal mysql commands, which got recorded in the bin-logs and were causing conflicts. We kept resolving these conflicts and successfully applied the extracted statements on the backup. Phew! But it took 6 hours to do all this.

At about 8 PM India Time, we applied the customer's latest changes and replaced their database with the one we restored. It worked perfectly. All the missing records were there.

Lessons learned:

  • When in pressure, take a break.
  • Do not meddle with databases on a friday evening. The consequences can be haunting.
  • Make sure that bin-logs and backups are working, everyday.
  • and lastly, prevention wastes a lot less time than a cure.

Anand Doshi

Anand is the Chief Technology Officer at ERPNext. He reads fiction, dabbles in photography and is always on the watch for the best ToDo app.

No comments yet

No comments yet. Start a new discussion.

Add Comment