Advanced Network Troubleshooting: Using traceroute

(Is your server's network not living up to its potential? Order a server from us with promo code PACKETS for 15% off your first invoice)

This discussion is a continuation on our series about network troubleshooting.Â On this article, we focus on troubleshooting connectivity problems through examining the output produced by the traceroute command.

The traceroute command lists all the router jumps that happen between your server and the target server. Checking this list helps you verify if the routing over the networks in between is correct.Â All operating systems carry some form of router-path tracing utility. Linux distributions, for example, have tracepath and traceroute6 (for IPv6; equivalent to traceroute -6), while Windows has tracert and PathPing (Windows NT).

This is how traceroute works:

It sends a ICMP or UDP packet with a time-to-live (TTL) of â€œ0â€ to the target server.
The first router on the path recognizes that the TTL already exceeded and drops the packet. At the same time, this router also sends an Internet Control Message Protocol (ICMP) time-exceeded message back to the source.
traceroute then records the IP address of the router that sent the ICMP message as this is the first â€œhopâ€ on the path to the final server destination.
traceroute does the same action but uses a TTL of â€œ1â€ this time. The first hop reads this packet, decrements its TTL to 0 and forwards it to the hop on the path. Second router then does the same actions as in step 3.
This continues until the final or target server is reached.

You will, of course, only receive responses from functioning machines. Simply put, if a device responds when you do your troubleshoot, it is not likely the source of the connectivity problem.

Use the following syntax to generate traceroute reports:

Â # traceroute [destination_host]

Below is an example of a traceroute output for a query on google.com. Note that all the hop times are less than 50 milliseconds (ms). This is the acceptable return speed.

# traceroute google.com
Resolving Address: google.com
traceroute to google.com (74.125.196.138), 30 hops max, 60 byte packets
Â 1Â example.lan (X.X.X.X)Â 0.649 msÂ 0.644 msÂ 0.674 ms
Â 2Â 67.23.161.132 (67.23.161.132)Â 0.212 msÂ 0.414 msÂ 0.412 ms
Â 3Â 67.23.161.142 (67.23.161.142)Â 6.494 msÂ 6.510 msÂ 6.673 ms
Â 4Â aix.pr1.atl.google.com (198.32.132.41)Â 6.593 msÂ 6.600 msÂ 6.600 ms
Â 5Â 72.14.233.54 (72.14.233.54)Â 6.925 msÂ 14.785 ms 72.14.233.56 (72.14.233.56)Â 6.811 ms
Â 6Â 66.249.94.22 (66.249.94.22)Â 7.310 ms 66.249.94.24 (66.249.94.24)Â 7.372 ms 66.249.94.20 (66.249.94.20)Â 7.345 ms
Â 7Â 209.85.248.31 (209.85.248.31)Â 7.357 ms 216.239.46.186 (216.239.46.186)Â 7.392 ms 209.85.243.26 (209.85.243.26)Â 7.265 ms
Â 8Â * * *
Â 9Â yk-in-f138.1e100.net (74.125.196.138)Â 7.291 msÂ 7.457 msÂ 7.264 ms

The table below defines the code symbols that traceroute can return:

Returned Code	Description
***	The expected 5-second response time was exceeded. The delay could be caused by one of the following: A router on the path is not sending back the ICMP time-exceeded messages. A router or firewall in the path is blocking the ICMP time-exceeded messages. The target IP address is not responding.
!H,Â !N, orÂ !P	The host, network, or protocol is not reachable.
!X orÂ !A	An administrator-imposed setting is blocking the, which means that either a router Access Control List (ACL) or firewall is in the way.
!S	The source route has failed as traceroute attempts to use a certain path. A certain router security setting might be causing this failure.

Performing bidirectional traces

Always trace from both directions: from the source IP to the target IP, and from the target IP to the source IP. Routes are often asymmetric, which mean they take one path in one direction and a different path in the return direction. Trace the route both ways to pinpoint a problem more accurately.

Tracing via looking glass

A lot of Internet service providers (ISPs) provide a facility to do a traceroute from dedicated servers called looking glasses. As these looking glasses are in various locations, you can trace whether the connectivity issue you are experiencing stems from your web server or from the ISP being used.Â

You can do a quick web search and query the term â€œInternet looking glassâ€ to a get a long list of alternatives. You can also go to traceroute.org, which already lists looking glasses by country.

Time-exceeded false alarm

If traceroute does not get a response within a 5-second timeout interval, three asterisks (see table above) appear beside that hop:

# traceroute arin.com

Resolving Address: arin.com
traceroute to arin.com (192.149.252.124), 30 hops max, 60 byte packets

Â 1Â 208.69.X.X (208.69.X.X)Â 0.485 msÂ 0.467 msÂ 0.497 ms
Â 2Â 67.23.161.132 (67.23.161.132)Â 0.308 msÂ 0.324 msÂ 0.466 ms
Â 3Â 67.23.161.142 (67.23.161.142)Â 6.474 msÂ 6.524 msÂ 6.688 ms
Â 4Â xe-9-1-3.edge5.Atlanta2.Level3.net (4.71.254.77)Â 6.590 msÂ 6.612 msÂ 6.613 ms
Â 5Â ae-4-90.edge2.Washington4.Level3.net (4.69.149.208)Â 19.102 ms ae-1-60.edge2.Washington4.Level3.net (4.69.149.16)Â 19.013 ms ae-3-80.edge2.WashinÂ Â gton4.Level3.net (4.69.149.144)Â 19.252 ms
Â 6Â ae-3-80.edge2.Washington4.Level3.net (4.69.149.144)Â 19.040 ms ae-4-90.edge2.Washington4.Level3.net (4.69.149.208)Â 19.033 msÂ 19.219 ms
Â 7Â COX-COMMUNI.edge2.Washington4.Level3.net (4.53.114.34)Â 83.252 msÂ 83.047 ms COX-COMMUNI.edge2.Washington4.Level3.net (4.53.114.58)Â 35.844 ms
Â 8Â mrfddsrj01-ae0.0.rd.dc.cox.net (68.1.1.5)Â 21.366 msÂ 21.369 msÂ 21.654 ms
Â 9Â * * *
10Â * * *
11Â wsip-98-172-152-14.dc.dc.cox.net (98.172.152.14)Â 77.031 msÂ 23.110 msÂ 23.114 ms
12Â * * *
13Â * * *
14Â * * *
15Â * * *
16Â * * *
17Â * * *
18Â * * *
19Â * * *
20Â * * *
21Â * * *
22Â * * *
23Â * * *
24Â * * *
25Â * * *
26Â * * *
27Â * * *
28Â * * *
29Â * * *
30Â * * *

Note that there are devices that prevent receiving traceroute packets but allow ICMP packets. To get around this, add an -I flag to the traceroute syntax so that it uses ICMP packets instead. See the change below after the -I flag was used:

# traceroute -I arin.com
traceroute to arin.com (192.149.252.125), 30 hops max, 60 byte packets
Â 1Â 208.69.X.X (208.69.X.X)Â 0.504 msÂ 0.508 msÂ 0.556 ms
Â 2Â 67.23.161.132 (67.23.161.132)Â 0.290 msÂ 0.315 msÂ 0.348 ms
Â 3Â 67.23.161.142 (67.23.161.142)Â 6.595 msÂ 6.603 msÂ 6.772 ms
Â 4Â xe-9-1-3.edge5.Atlanta2.Level3.net (4.71.254.77)Â 7.612 msÂ 7.617 msÂ 7.618 ms
Â 5Â ae-1-60.edge2.Washington4.Level3.net (4.69.149.16)Â 19.107 ms Â 19.111 msÂ 19.111 ms
Â 6Â ae-1-60.edge2.Washington4.Level3.net (4.69.149.16)Â 19.109 msÂ 19.034 msÂ 19.188 ms
Â 7Â COX-COMMUNI.edge2.Washington4.Level3.net (4.53.114.34)Â 60.466 msÂ 69.478 msÂ 69.467 ms
Â 8Â mrfddsrj01-ae0.0.rd.dc.cox.net (68.1.1.5)Â 63.591 msÂ 57.189 msÂ 57.176 ms
Â 9Â * * *
10Â * * *
11Â wsip-98-172-152-14.dc.dc.cox.net (98.172.152.14)Â 88.691 msÂ 88.191 msÂ 88.179 ms
12Â host-252-131.arin.net (192.149.252.131)Â 86.348 msÂ 86.030 msÂ 86.018 ms
13Â www.arin.net (192.149.252.125)Â 87.442 msÂ 86.854 msÂ 53.690 ms

Slow internet false alarm

The tracert output below seems to show that a website with the IP 80.40.X.X is loading slowly because there is congestion at hops 6 and 7 where the response time is over 200ms:

C:\>tracert 80.40.X.X

1Â Â Â Â 1 msÂ Â Â Â 2 msÂ Â Â Â 1 msÂ 66.134.200.97

2Â Â Â 43 msÂ Â Â 15 msÂ Â Â 44 msÂ 172.31.255.253

3Â Â Â 15 msÂ Â Â 16 msÂ Â Â Â 8 msÂ 192.168.21.65

4Â Â Â 26 msÂ Â Â 13 msÂ Â Â 16 msÂ 64.200.150.193

5Â Â Â 38 msÂ Â Â 12 msÂ Â Â 14 msÂ 64.200.151.229

6Â Â 239 msÂ Â 255 msÂ Â 253 msÂ 64.200.149.14

7Â Â 254 msÂ Â 252 msÂ Â 252 msÂ 64.200.150.110

8Â Â Â 24 msÂ Â Â 20 msÂ Â Â 20 msÂ 192.174.250.34

9Â Â Â 91 msÂ Â Â 89 msÂ Â Â 60 msÂ 192.174.47.6

10Â Â 17 msÂ Â Â 20 msÂ Â Â 20 msÂ 80.40.96.12

11Â Â 30 msÂ Â Â 16 msÂ Â Â 23 msÂ 80.40.X.X

Trace complete.

C:\>

This is not an outright indication of congestion, latency, or packet loss. If those issues are really happening, then all the other hops past 7 should have been problematic as well. What the trace result above actually says is that the devices on hops 6 and 7 were just slow to respond with ICMP TTL-exceeded messages. Remember that a lot of web routing devices give very low priority to packets related to trace utilities so they can give more bandwidth to other more lucrative traffic.Â

Request timeout before reaching target server

If the trace times out before the target server is reached, the possible causes may can one of the following scenarios:

A server has a bad default gateway.
The server is running a firewall that blocks traceroute.
The server is either shut down, disconnected from the network, or has an incorrectly configured network interface controller (NIC).

In the example below, the last device that responded to traceroute is a router that acts as the default gateway of the server. Remember that the problem, in this instance, is not with the router but with the server as traceroute only receives responses from functioning devices.

C:\>tracert 82.40.X.X

Tracing route to 82.40.X.X over a maximum of 30 hops

1Â Â Â 33 msÂ Â Â 49 msÂ Â Â 28 msÂ 192.168.1.1

2Â Â Â 33 msÂ Â Â 49 msÂ Â Â 28 msÂ 65.14.65.14

3Â Â Â 33 msÂ Â Â 32 msÂ Â Â 32 msÂ 81.25.69.252

4Â Â Â 47 msÂ Â Â 32 msÂ Â Â 31 msÂ 82.40.57.1

5Â Â Â 29 msÂ Â Â 28 msÂ Â Â 32 msÂ 82.40.97.114

6Â Â Â Â *Â Â Â Â Â Â Â *Â Â Â Â Â Â Â *Â Â Â Â Request timed out.

7Â ^C

C:\>

Troubleshooting example

A ping to 162.219.X.X gave a TTL timeout message. Usually, this event only happens if there is a routing loop wherein the packet bounces between two routers on the way to the target server. Each bounce makes the TTL decrease by a count of â€œ1â€ until it reaches â€œ0,â€ at which point the ping request times out.

The mentioned routing loop was confirmed when a traceroute was done and the packet was seen bouncing between routers 12.34.56.78 and 12.34.56.79:

C:\>ping 162.219.X.X

Pinging 162.219.X.X with 32 bytes of data:
Reply from 208.69.Y.Y: TTL expired in transit.
Reply from 208.69.Y.Y: TTL expired in transit.
Reply from 208.69.Y.Y: TTL expired in transit.
Reply from 208.69.Y.Y: TTL expired in transit.

Ping statistics for 162.219.X.X:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

C:\>tracert 162.219.X.X

Tracing route to myserver.example.net [162.219.X.X]
over a maximum of 30 hops:

1Â Â Â <1 msÂ Â Â <1 msÂ Â Â <1 msÂ 192.168.1.1
2Â Â Â 60 msÂ Â Â 70 msÂ Â Â 60 msÂ router-2.example.net [12.34.56.79]
3Â Â Â 70 msÂ Â Â 71 msÂ Â Â 70 msÂ router-1.example.net [12.34.56.78]
4Â Â Â 60 msÂ Â Â 70 msÂ Â Â 60 msÂ router-2.example.net [12.34.56.79]
5Â Â Â 70 msÂ Â Â 70 msÂ Â Â 70 msÂ router-1.example.net [12.34.56.78]
6Â Â Â 60 msÂ Â Â 70 msÂ Â Â 61 msÂ router-2.example.net [12.34.56.79]
7Â Â Â 70 msÂ Â Â 70 msÂ Â Â 70 msÂ router-1.example.net [12.34.56.78]
8Â Â Â 60 msÂ Â Â 70 msÂ Â Â 60 msÂ router-2.example.net [12.34.56.79]
9Â Â Â 70 msÂ Â Â 70 msÂ Â Â 70 msÂ router-1.example.net [12.34.56.78]
...
...
...
Trace complete.
C:\>

The routers with IPs 12.34.56.78 and 12.34.56.79 had their routing processes reset to solve the problem. Further investigation showed that the issue was set off by an unstable network link that caused frequent routing recalculations. The constant activity eventually corrupted the routing tables of one of the routers.

Reasons for failed traceroutes

There are several possible reasons a traceroute fails to reach the target server:

The traceroute packets are blocked or rejected by a router in the path. Usually, the router immediately after the last visible hop is the one causing the blockage. Check the routing table and the status of this device.
The target server does not exist on the network, which means it is either disconnected or turned off. Note that !H orÂ !N messages are likely to appear.
The network where you are expecting the target host to be in does not exist in the routing table of one of the routers in the path. Note that !H orÂ !N messages are likely to appear.
Wrong IP address is used for the target server.
There is a routing loop where packets bounce between two routers and never reach the target destination.
The packets do not have a proper return path to your server. The router immediately after the last visible hop where the routing changes. If this occurs, do the following steps:

Log on to the last visible router.
Look at the routing table to know where the next hop should be.
Log on to this next hop router.
Do a traceroute from this router to your target server.

If the trace completes â€“ The routing to the target server is working fine. Trace back to your source server and traceroute will probably fail at the bad router on the return path.
If the trace fails - Test the routing table and check the other status of all the hops between this router and your target destination.

Essentially, if nothing is blocking your traceroute packets, then the last visible router of an incomplete trace is either the last good router on the path or the last router with a valid return path to the server that issued the traceroute.

TheÂ tracerouteÂ command is a very handy tool when troubleshooting network connectivity problems. Understanding it is crucial for every network administrator.

Hjälpcentral

Kategorier

Kategorier

Relaterade artiklar

Tag Cloud

Support

Services

About QuickPacket

Hjälpcentral

Kategorier

Kategorier