traceroute is evil | Created: 20.06.2009 18:32 |
One of the nicest tools to help trace problems on the Intra-
or Internet is traceroute. However, it can
be horribly misleading. Some of its widely known limitations
include:
What happened was that a contractor noticed some strange effects in traceroute and considered the network to have a problem: traceroute to 192.168.81.166 (192.168.81.166), 30 hops max, 40 byte packets 1 10.188.3.254 (10.188.3.254) 8.269 ms 8.255 ms 8.248 ms 2 192.168.81.166 (192.168.81.166) 0.177 ms 0.178 ms 0.172 ms That one was easy to explain - the router is a big one, and though it routes in hardware, the answers to the traceroute have to be generated on its rather slow cpu, leading to delays if the cpu is busy with other things. Packets just passing through it are handled by special hardware, thus they are not delayed even if the cpu is busy with maintenance work, thus the next hop is reachable in roughly 1/50th the time it seems to take to reach the router, even though the packets pass through that exact same router.However, there was more strangeness. Some traces looked like this: traceroute to 192.168.81.166 (192.168.81.166), 30 hops max, 40 byte packets 1 10.188.3.254 (10.188.3.254) 1.264 ms 1.256 ms 1.251 ms 2 192.168.81.166 (192.168.81.166) 0.135 ms 0.133 ms * traceroute to 192.168.81.166 (192.168.81.166), 30 hops max, 40 byte packets 1 10.188.3.254 (10.188.3.254) 1.224 ms 1.217 ms 1.212 ms 2 * * * 3 * * * 4 * * * 5 * * * 6 * * * 7 192.168.81.166 (192.168.81.166) 0.139 ms 0.134 ms 0.125 ms That seemed very weird at first sight. After all it was not the router that lost replies, it was the target host - and the target host was an idle linux machine with no firewalling configured, so there shouldn't have been any drops there. Was there really a network problem? Did the network lose packets?As it turns out: No. With the help of tcpdump, the problem became apparent pretty fast. The linux box got the packet, but just did not reply. And once that was clear, the explanation was apparent. Traceroutes can be done in three different ways: With ICMP, UDP or TCP requests. In any case, the basic trick is to set the Time-to-live (TTL) of the packet sent to the number of the hop you're interested in, and then check from which host the "TTL expired in transit" error message for that packet arrives - voila, that's your hop. However, the last hop (with the target host) in the traceroute is different. Because when the trace reaches the target host, the time-to-live is not expired, so naturally it will not generate a "TTL expired" error message. Instead, it will reply with an "destination port unreachable" error message (for UDP traceroute) or an "icmp echo reply" (for ICMP traceroute). And there lies the problem: The linux traceroute utility by default does an UDP trace. However, even with no firewalling set up, linux limits the number of "destination port unreachable" error messages generated. And that limit by default is set rather low - too low for traceroute to work reliably on fast networks. The traceroute packets can arrive faster than the rate limiting allows the linux host to reply. So there are a few ways to get reliable (w.r.t. the last hop) traceroute results:
|
|
no comments yet write a new comment:
|