Posted on Tuesday, 23rd December 2008 by sean
Yesterday, we had a web site crash. I was curious if it had to do with load or something else was going on. This is a great opportunity to show how to analyze NetFlow data.
First, I should mention that there may be easier ways of doing this. The flow-tools package includes a lot of tools, and I feel like I learn something new each time I use them. I also tend to use a lot of shell scripting, so I may do this a different way each time.
Going back to the idea of a flow, I should be able to figure out the connection rate by looking at the number of inbound flows per second. A flow is half of a conversation (see the end of an article for an exception).
flow-cat can take the name of a file (or files), or the name of a directory, in which case it spits out all the files in the directory. On my Internet collector, I pass “-N -1″ to flow-capture to have the flows in separate directories per day (on my internal collector I don’t, go figure).
The first thing to do is filter my flows so that only incoming connections to the web server are caught. The flow-nfilter command can do this.
/etc/flow-tools/cfg/filter.cfg contains the filters. Some predefined ones are there for you, most notably, a “filter by destination address”:
filter-definition ip-dst-addr
match ip-destination-address VAR_ADDR
This defines a filter that matches a destination address of VAR_ADDR. But what's VAR_ADDR? Earlier on in the file you'll see:
filter-primitive VAR_ADDR
type ip-address
permit @{ADDR:-0.0.0.0}
This primitive is an ip address, and either takes the value of ADDR, or failing that, 0.0.0.0.
Looking through the flow-nfilter manpage, you can set variables on the command line with -v. So, to see only the incoming flows to the web server, you get
flow-cat /var/flow-tools/2008-12-22/ | flow-nfilter -F ip-dst-addr -v ADDR=x.x.x.x | flow-print
This calls the ip-dst-addr filter and assigns x.x.x.x (the server address) to the ADDR field.
With a bit of shell magic, you can iterate through all the files in the directory and use the filename to write out the time, and then count the number of flows in the file:
for i in /var/flow-tools/2008-12-22/ft*; do echo -n $i| sed 's/.*\.\(..\)\(..\).*/\1:\2/' ; echo -n " "; flow-cat $i| flow-nfilter -F ip-dst-addr -v ADDR=x.x.x.x| wc -l; done > data
I've redirected the output to a file called "data", which looks like this:
[root@netflow ~]# head data
00:00 9388
00:05 17850
00:10 15280
00:15 11759
00:20 14840
00:25 11018
...
The last step is to use Gnuplot to plot the data. Start by typing "gnuplot"
set timefmt "%H:%M"
set xdata time
set terminal png large color picsize 1200 480
set output '/var/www/html/stats.png'
plot 'data' using 1:($2/300) with linespoints
I then look at stats.png through my web browser. In this case, I went back and edited the data file to cut down on the number of datapoints around the outage, which ends up with something like:

From the graph I can see a few places where the connection rate drops which is indicative of a problem. However after that, the website is able to keep up.
When is a flow not a flow?
I had the command
ip flow-cache timeout active 2
in the configuration that I use. This sets the timeout of an active flow to 2 minutes. If a file transfer goes for longer than 2 minutes, the router will stop that flow and create a new one.
Normally, if you're using 5 minute data and you have a transfer that takes 6 minutes, the flow record will be written when the flow expires. All the transfer will look like it happened in the 6th minute, which really skews your stats. Breaking the flow up into 3 smaller flows, each about 2 minutes long, makes the effect less noticeable.
The flow start/stop times are always written to the flow, but often we're doing simpler analysis of the flows and don't take the time to resort the data (there's going to be a lot of it, after all!).
No related posts.
Related posts brought to you by Yet Another Related Posts Plugin.
Posted in Network Management | Comments (0)
