Posted on Monday, 16th January 2006 by sean
An interesting thing happened last week that illustrates how understanding the behaviour of a packet travelling through a network explains the cause of some rather odd traffic patterns.
It was noticed that all active ports on a certain switch would simultaneously see traffic spikes in excess of 8Mbit/s, even though these ports were normally dormant. I don’t have the pictures anymore, but it’s quite obvious that something was up. What follows is a slightly modified version of what happened, edited to make it more clear about what’s really going on.
Luckily the event happened again while we were able to watch it. Using Ethereal, we saw one side of a conversation being flooded out all the switch ports.
The behaviour of a switch is to capture the source MAC of a frame and associate it with the port it came in on within the Content Addressable Memory (CAM) table. Whenever a frame is to be sent, the CAM table is consulted for the destination MAC, and if found, the frame is only sent out that port. If the MAC isn’t found, the frame is sent out all active ports in the VLAN, except the frame from which it was received.
True to spec, the MAC of the local host could not be found with the show mac-address-table command. (hint, show mac-address-table | include XXXX, where XXXX is the last 4 nybbles of the MAC, is a fast way to search)
In this situation, LocalHost was talking to RemoteHost over the WAN. The port flooding was being seen on the switch connected to the WAN router, while LocalHost was on another switch, but on the same VLAN as the WAN router.
From layer 3, everything seemed to be with the RemoteHost->LocalHost communication, but at layer 2 it was being flooded.
At this point a diagram would help. Please pardon the ASCII art until I can get a Visio done up! Here is a cut down version of the network:
RemoteHost --- WAN RTR ----[sw1]-----[sw2]-----Core RTR
|
LocalHost
It was sw1 where the port flooding was occurring.
Turning our attention to layer 2, we looked at the MAC address of the packets. LocalHost was sending its packet to Core RTR, while the packet hitting the WAN came from Core RTR. Looking closely at the configuratin of Core RTR, no ip redirects was configured on the VLAN. Turning this on fixed the problem.
So what was happening?
LocalHost’s gateway was Core RTR. To send a packet, LocalHost consults its routing table, determines the destination is reachable by the gateway, and sends out the packet as follows:
L2: LocalHost->Core RTR
L3: LocalHost->Remote Host
Keeping track of the CAM tables will be helpful:
SW1: empty
SW2: LocalHost
The Core RTR then picks up the packet, looks at the destination IP address, and sends it along.
L2: Core RTR->WAN RTR
L3: LocalHost->Remote Host
SW1: Core RTR
SW2: Core RTR, LocalHost
The normal behaviour in this case would have been to send the packet, but also issue an ICMP Redirect to LocalHost, telling it that he’s better off talking to the WAN RTR for that particular destination. This is repressed with no ip redirects
The WAN router gets the packet and sends it out the WAN and to the destination. The response comes back to the WAN router. However, since the WAN router is on the same subnet as LocalHost, it can send the packet directly:
L2: WAN RTR->LocalHost
L3: RemoteHost->LocalHost
Looking at the CAM tables in the previous step, LocalHost isn’t known to SW1 so it floods the packet to all the switchports. As long as this asymetric path exists, the flooding would happen. Even if LocalHost were to do something as simple as ping WAN RTR, SW1 would learn about LocalHost’s MAC and temporarily
(5 minutes by default) eliminate the port flooding.
In conclusion, solving this problem required understanding the behaviour of a router beyond simply “routing packets between interfaces”. Also, it required viewing layer 2 information in the sniffer, something that most don’t show by default (-e adds this information in tcpdump)
Posted in General | Comments (0)