PoempelFox Blog

Sat, 20. Jun 2009

traceroute is evil Created: 20.06.2009 18:32

One of the nicest tools to help trace problems on the Intra- or Internet is traceroute. However, it can be horribly misleading. Some of its widely known limitations include:

You only can trace the hops in one direction. It gives you no information about the route in the other direction, back to you. These routes can be completely different.
Routers usually handle them in software on their rather slow general purpose CPUs, other than the routing they do in hardware. That leads to the strange effect that some hops in a route reply really slow and even drop packets, while the hops further down the road reply fast.
If the path to a target is not always the same during the duration of the trace, traceroute will display crap.

I had to learn of another potential problem last week.
What happened was that a contractor noticed some strange effects in traceroute and considered the network to have a problem:

traceroute to 192.168.81.166 (192.168.81.166), 30 hops max, 40 byte packets 1 10.188.3.254 (10.188.3.254) 8.269 ms 8.255 ms 8.248 ms 2 192.168.81.166 (192.168.81.166) 0.177 ms 0.178 ms 0.172 ms

That one was easy to explain - the router is a big one, and though it routes in hardware, the answers to the traceroute have to be generated on its rather slow cpu, leading to delays if the cpu is busy with other things. Packets just passing through it are handled by special hardware, thus they are not delayed even if the cpu is busy with maintenance work, thus the next hop is reachable in roughly 1/50th the time it seems to take to reach the router, even though the packets pass through that exact same router.
However, there was more strangeness. Some traces looked like this:

traceroute to 192.168.81.166 (192.168.81.166), 30 hops max, 40 byte packets 1 10.188.3.254 (10.188.3.254) 1.264 ms 1.256 ms 1.251 ms 2 192.168.81.166 (192.168.81.166) 0.135 ms 0.133 ms *

traceroute to 192.168.81.166 (192.168.81.166), 30 hops max, 40 byte packets 1 10.188.3.254 (10.188.3.254) 1.224 ms 1.217 ms 1.212 ms 2 * * * 3 * * * 4 * * * 5 * * * 6 * * * 7 192.168.81.166 (192.168.81.166) 0.139 ms 0.134 ms 0.125 ms

That seemed very weird at first sight. After all it was not the router that lost replies, it was the target host - and the target host was an idle linux machine with no firewalling configured, so there shouldn't have been any drops there. Was there really a network problem? Did the network lose packets?
As it turns out: No. With the help of tcpdump, the problem became apparent pretty fast. The linux box got the packet, but just did not reply. And once that was clear, the explanation was apparent.
Traceroutes can be done in three different ways: With ICMP, UDP or TCP requests. In any case, the basic trick is to set the Time-to-live (TTL) of the packet sent to the number of the hop you're interested in, and then check from which host the "TTL expired in transit" error message for that packet arrives - voila, that's your hop. However, the last hop (with the target host) in the traceroute is different. Because when the trace reaches the target host, the time-to-live is not expired, so naturally it will not generate a "TTL expired" error message. Instead, it will reply with an "destination port unreachable" error message (for UDP traceroute) or an "icmp echo reply" (for ICMP traceroute). And there lies the problem: The linux traceroute utility by default does an UDP trace. However, even with no firewalling set up, linux limits the number of "destination port unreachable" error messages generated. And that limit by default is set rather low - too low for traceroute to work reliably on fast networks. The traceroute packets can arrive faster than the rate limiting allows the linux host to reply. So there are a few ways to get reliable (w.r.t. the last hop) traceroute results:

Use a windows machine for tracerouting. That works because the windows traceroute utility by default uses ICMP, not UDP, and that does not fall into the default rate limiting...
Use the traceroute parameter -I to make it use ICMP instead of UDP. Unfortunately, that parameter is only available to root.
Turn the rate limiting off at the target host: echo 0 > /proc/sys/net/ipv4/icmp_ratemask - as that limit usually makes sense, this is not recommended.
Turn up the rate limit setting a notch in /proc/sys/net/ipv4/icmp_ratelimit. The default is 250, the redhat enterprise default of 1000 seems to make a lot more sense.

Of course, you could just use ping instead if you're interested in the last hop, but that would be boring, wouldn't it?

no comments yet

write a new comment:

EOPage - generated with blosxom