Debugging Network Connectivity Issues

We often forget that the Internet is a utility that is run on cables, pipes, power and a lot more and it can go down sometimes too.

 · 3 min read

The Email

I woke up to an email alert on my phone. It was from Prakash: Most Urgent - ERPNext not working! Okay - panic mode activated. I booted my laptop, opened two tabs in chrome for erpnext.com and google.com. Google was working, ERPNext was not.

Finding The Problem

I tried logging in to our server with no luck. First thought - the server might've crashed. But I also couldn't login to our backup server. Further, I was unable to access Hetzner's website (our server host). So, the issue might be with Hetzner's network. To confirm this, I used my phone's internet connection to access erpnext.com and hetzner.de - it worked! So the problem was not with Hetzner after all, but with my Internet Service Provider (ISP) - MTNL. Nothing to worry, customer's weren't affected. Or so I thought.

Support Ticket

I logged into our ERPNext account and found that three of our customers were unable to access their accounts. They were from Mumbai, Delhi and Sri Lanka. The problem was not just with MTNL. Most probably, an under-sea cable or a router was malfunctioning.

Traceroute

A few days back, I learnt a unix command called "traceroute". When you type erpnext.com in your browser's address bar and click on Go, your request to see the home page of erpnext.com, is sent in the form of packet with the ip address of erpnext.com stamped on it (46.4.50.84). This packet is passed on from one router to another. Routers are electronic networking devices and act as couriers. They try to deliver this packet to the address mentioned on it. "traceroute" helps you detect which couriers are handling your packet and maps the path taken by the packet.

Here's how it looks when erpnext.com is working:


Anands-MacBook-Pro:~ anandpdoshi$ traceroute erpnext.com
traceroute to erpnext.com (46.4.50.84), 64 hops max, 52 byte packets
 1  192.168.2.1 (192.168.2.1)  13.686 ms  2.120 ms  1.169 ms
 2  192.168.1.1 (192.168.1.1)  2.706 ms  2.440 ms  2.435 ms
 3  triband-mum-59.184.191.254.mtnl.net.in (59.184.191.254)  36.411 ms  42.400 ms  36.267 ms
 4  static-mum-59.185.4.41.mtnl.net.in (59.185.4.41)  37.901 ms  36.800 ms  57.553 ms
 5  aes-static-177.105.144.59.airtel.in (59.144.105.177)  186.944 ms  184.137 ms  183.051 ms
 6  125.62.187.193 (125.62.187.193)  186.071 ms  184.546 ms  183.587 ms
 7  linx-1.init7.net (195.66.224.175)  183.628 ms  184.900 ms  184.228 ms
 8  r1nue1.core.init7.net (77.109.140.254)  198.058 ms  197.872 ms  198.276 ms
 9  gw-hetzner.init7.net (77.109.135.102)  200.627 ms  201.439 ms  199.985 ms
10  hos-bb2.juniper1.rz14.hetzner.de (213.239.240.152)  202.934 ms  326.391 ms  201.765 ms
11  hos-tr2.ex3k1.rz14.hetzner.de (213.239.224.162)  205.560 ms  306.970 ms  307.137 ms
12  static.84.50.4.46.clients.your-server.de (46.4.50.84)  201.694 ms  202.293 ms  309.299 ms


Here's how it looked at 9:00 AM today, when using MTNL's internet connection:


Anands-MacBook-Pro:~ anandpdoshi$ traceroute erpnext.com
traceroute to erpnext.com (46.4.50.84), 64 hops max, 52 byte packets
 1 192.168.2.1 (192.168.2.1)  1.463 ms  1.350 ms  1.408 ms
 2 192.168.1.1 (192.168.1.1)  2.586 ms  2.302 ms  2.282 ms
 3 triband-mum-59.184.191.254.mtnl.net.in (59.184.191.254)  37.350 ms  45.194 ms  37.013 ms
 4 static-mum-59.185.4.41.mtnl.net.in (59.185.4.41)  36.141 ms  36.944 ms  37.395 ms
 5 aes-static-177.105.144.59.airtel.in (59.144.105.177)  37.437 ms  41.538 ms  37.605 ms
 6 203.101.100.49 (203.101.100.49)  63.176 ms  63.945 ms  62.497 ms
 7 182.79.220.202 (182.79.220.202)  64.798 ms  63.947 ms  63.756 ms
 8 * * *
 9 203.101.100.210 (203.101.100.210)  66.929 ms  62.303 ms
   182.79.220.202 (182.79.220.202)  63.521 ms
10 * * *
11 182.79.220.202 (182.79.220.202)  67.956 ms
   203.101.100.210 (203.101.100.210)  65.184 ms
   182.79.220.202 (182.79.220.202)  63.194 ms
12 * * *


The packet was unable to progress after reaching the router with ip address - 182.79.220.202. "whois", a command that reveals the ownership of an ip address, revealed that it belonged to Bharti Airtel Limited. So a router, belonging to Airtel, was unable to send the packet further along its path.

What Now?

I am not a customer of Airtel. So I dialed MTNL's customer support. After exchanging a few phone calls and emails, they helped me escalate the issue to the person responsible for their network. However, before I can send him the traceroute, the issue was resolved. Airtel had re-routed their traffic via a router with ip address 125.62.187.193.

Conclusion

ERPNext was inaccessible to some users in the Indian Sub-continent for about two hours, owing to a router malfunction belonging to an ISP in India. I learnt that I could help by registering a complaint to the right people. Also, MTNL has a very co-operative staff. I appreciate them for their effort and time.


Anand Doshi

Anand is the Chief Technology Officer at ERPNext. He reads fiction, dabbles in photography and is always on the watch for the best ToDo app.

4 comments
Anand Doshi July 2, 2013

@Anand - You are right. Our setup isn't big enough for such complications. But it is a good sugge

@Anand July 1, 2013

I would recommend setting up nagios from a local office system. Checkout remote nrpe plugins. Nag

Aditya Duggal April 9, 2013

Hi Anand,

It was indeed a problem of Airtel and all those ISPs using the undersea cable o

Joseph John April 8, 2013

Good blog Why not use a netwrok monitoring tool of nagios or zabbix to notify u the connectivity

Add Comment