On a couple of occasions recently, we’ve noticed swap use getting out of hand on a server or two. There’s been no common cause so far, but the troubleshooting approach has been the same in each case.
To try and tell the difference between a VM which is generally “just a bit tight on resources” and a situation where process has run away – it can sometimes be handy to work out what processes are hitting swap.
The approach I’ve been using isn’t particularly elegant, but it has proved useful so I’m documenting it here:
grep VmSwap /proc/*/status 2>&1 | perl -ne '/\/(\d+)\/[^\d]*(\d+) (.B)$/g;if($2>0){$name=`ps -p $1 -o comm=`;chomp($name);print "$name ($1) $2$3\n"}'
Lets pick it apart a component at a time.
grep VmSwap /proc/*/status 2>&1
The first step is to pull out the VmSwap line from the PID status files held in /proc. There’s one of these files for each process on the system and it tracks all sorts of stuff. VmSwap is how much swap is currently being used by this process. The grep gives output like this:
...
/proc/869/status:VmSwap: 232 kB
/proc/897/status:VmSwap: 136 kB
/proc/9039/status:VmSwap: 5368 kB
/proc/9654/status:VmSwap: 312 kB
...
That’s got a lot of useful info in it (eg the PID is there, as is the amount of swap in use), but it’s not particularly friendly. The PID is part of the filename, and it would be more useful if we could have the name of the process as well as the PID.
Time for some perl…
perl -ne '/\/(\d+)\/[^\d]*(\d+) (.B)$/g;if($2>0){$name=`ps -p $1 -o comm=`;chomp($name);print "$name ($1) $2$3\n"}'
Dealing with shell side of things first (before we dive into the perl code) “-ne” says to perl “I want you to run the following code against every line of input I pipe your way”.
The first thing we do in perl itself is run a regular expression across the line of input looking for three things; the PID, the amount of swap used and the units reported. When the regex matches, this info gets stored in $1, $2 and $3 respectively.
I’m pretty sure the units are always kB but matching the units as well seemed safer than assuming!
The if statement allows us to ignore processes which are using 0kB of swap because we don’t care about them, and they can cause problems for the next stage:
$name=`ps -p $1 -o comm=`;chomp($name)
To get the process name, we run a “ps” command in backticks, which allows us to capture the output. “-p $1” tells ps that we want information about a specific PID (which we matched earlier and stored in $1), and “-o comm=” specifies a custom output format which is just the process name.
chomp is there to strip the ‘\n’ off the end of the ps output.
print "$name ($1) $2$3\n"
Lastly we print out the $name of the process, it’s PID and the amount of swap it’s using.
So now, you get output like this:
...
automount (869) 232kB
cron (897) 136kB
munin-node (9039) 5364kB
exim4 (9654) 312kB
...
The output is a little untidy, and there is almost certainly a more elegant way to get the same information. If you have an improvement, let me know in the comments!