What’s using all my swap?

On a couple of occasions recently, we’ve noticed swap use getting out of hand on a server or two. There’s been no common cause so far, but the troubleshooting approach has been the same in each case.

To try and tell the difference between a VM which is generally “just a bit tight on resources” and a situation where process has run away – it can sometimes be handy to work out what processes are hitting swap.

The approach I’ve been using isn’t particularly elegant, but it has proved useful so I’m documenting it here:

grep VmSwap /proc/*/status 2>&1 | perl -ne '/\/(\d+)\/[^\d]*(\d+) (.B)$/g;if($2>0){$name=`ps -p $1 -o comm=`;chomp($name);print "$name ($1) $2$3\n"}'

Lets pick it apart a component at a time.

grep VmSwap /proc/*/status 2>&1

The first step is to pull out the VmSwap line from the PID status files held in /proc. There’s one of these files for each process on the system and it tracks all sorts of stuff. VmSwap is how much swap is currently being used by this process. The grep gives output like this:

/proc/869/status:VmSwap:	     232 kB
/proc/897/status:VmSwap:	     136 kB
/proc/9039/status:VmSwap:	    5368 kB
/proc/9654/status:VmSwap:	     312 kB

That’s got a lot of useful info in it (eg the PID is there, as is the amount of swap in use), but it’s not particularly friendly. The PID is part of the filename, and it would be more useful if we could have the name of the process as well as the PID.

Time for some perl…

perl -ne '/\/(\d+)\/[^\d]*(\d+) (.B)$/g;if($2>0){$name=`ps -p $1 -o comm=`;chomp($name);print "$name ($1) $2$3\n"}'

Dealing with shell side of things first (before we dive into the perl code) “-ne” says to perl “I want you to run the following code against every line of input I pipe your way”.

The first thing we do in perl itself is run a regular expression across the line of input looking for three things; the PID, the amount of swap used and the units reported. When the regex matches, this info gets stored in $1, $2 and $3 respectively.

I’m pretty sure the units are always kB but matching the units as well seemed safer than assuming!

The if statement allows us to ignore processes which are using 0kB of swap because we don’t care about them, and they can cause problems for the next stage:

$name=`ps -p $1 -o comm=`;chomp($name)

To get the process name, we run a “ps” command in backticks, which allows us to capture the output. “-p $1” tells ps that we want information about a specific PID (which we matched earlier and stored in $1), and “-o comm=” specifies a custom output format which is just the process name.

chomp is there to strip the ‘\n’ off the end of the ps output.

print "$name ($1) $2$3\n"

Lastly we print out the $name of the process, it’s PID and the amount of swap it’s using.

So now, you get output like this:

automount (869) 232kB
cron (897) 136kB
munin-node (9039) 5364kB
exim4 (9654) 312kB

The output is a little untidy, and there is almost certainly a more elegant way to get the same information. If you have an improvement, let me know in the comments!

Molly-guard for CentOS 7?

Since I was looking at this already and had a few things to investigate and fix in our systemd-using hosts, I checked how plausible it is to insert a molly-guard-like password prompt as part of the reboot/shutdown process on CentOS 7 (i.e. using systemd).

Problems encountered include:

  • Asking for a password from a service/unit in systemd — Use systemd-ask-password and needs some agent setup to reply to this correctly?
  • The reboot command always walls a message to all logged in users before it even runs the new reboot-molly unit, as it expects a reboot to happen. The argument --no-wall stops this but that requires a change to the reboot command. Hence back to the original problem of replacing packaged files/symlinks with RPM
  • The reboot.target unit is a “systemd.special” unit, which means that it has some special behaviour and cannot be renamed. We can modify it, of course, by editing the reboot.target file.
  • How do we get a systemd unit to run first and block anything later from running until it is complete? (In fact to abort the reboot but just for this time rather than being set as permanently failed. Reboot failing is a bit of a strange situation for it to be in…) The dependencies appear to work but the reboot target is quite keen on running other items from the dependency list — I’m more than likely doing something wrong here!

So for now this is shelved. It would be nice to have a solution though, so any hints from systemd experts are greatfully received!

(Note that CentOS 7 uses systemd 208, so new features in later versions which help won’t be available to us)

Molly-guard for RHEL/CentOS – protect your hosts from accidental reboots!

Molly-guard is a very useful package which replaces the default halt and reboot (and other related) commands with a version which prompts you to type the hostname of the host you intended to halt/reboot before it continues to do so. For example:

root@testhost:~# reboot
I: molly-guard: reboot is always molly-guarded on this system.
Please type in hostname of the machine to reboot: [type incorrect hostname]
Good thing I asked; I won't reboot testhost ...
W: aborting reboot due to 30-query-hostname exiting with code 1.

This is invaluable if you use a lot of different systems and they are often in use by other users whom you don’t want to anger with accidental reboots…

For Debian-based distros (including Ubuntu), it’s available via a simple apt-get install molly-guard. On RHEL-based distros, unfortunately, it’s not in the base repositories and I was unable to find a suitably-trustworthy repository which contains it.

So this leads to asking some questions:

What does it do?

Simply put, it copies the existing /sbin/halt and related commands to a separate directory (by default /lib/molly-guard), and replaces them with symlinks to /lib/molly-guard/molly-guard to ensure that the new executable is used.

By default it only requires hostname confirmation when you are logged in via ssh, but this can be changed to always ask for the hostname by setting the ALWAYS_QUERY_HOSTNAME variable in the /etc/molly-guard/rc configuration file. Further customisations are possible by adding scripts to run to the /etc/molly-guard/run.d directory, and if any of these exit with a non-zero exit code then the reboot is aborted. (This is how the hostname check is done, but you can add whatever logic you want via this method)

How can we make this work on RHEL / Why are there no packages for RHEL?

Someone kindly ported a version of molly-guard from Debian to RHEL, and a github repo of this is available here. Unfortunately this doesn’t quite solve the problem, as creating a package from this (having updated it for molly-guard 0.6.2) creates an RPM which gives us errors when we try to install it:

Running Transaction Test

Transaction Check Error:
  file /sbin/halt from install of molly-guard-0.6.2-1.1.noarch conflicts with file from package upstart-0.6.5-13.el6_5.3.x86_64
  file /sbin/poweroff from install of molly-guard-0.6.2-1.1.noarch conflicts with file from package upstart-0.6.5-13.el6_5.3.x86_64
  file /sbin/reboot from install of molly-guard-0.6.2-1.1.noarch conflicts with file from package upstart-0.6.5-13.el6_5.3.x86_64
  file /sbin/shutdown from install of molly-guard-0.6.2-1.1.noarch conflicts with file from package upstart-0.6.5-13.el6_5.3.x86_64

RPM really doesn’t like replacing files which are owned by another package, so an alternative (you’ll see what I did there in a minute) strategy is required.

Handily there’s a tool called alternatives which can handle selecting which of a set of binaries to use, via managing a directory of symlinks. (See!)

If we re-create the RPM without the explicit symlinks and instead use a post-install script snippet which copies the halt/reboot binaries to the molly-guard directory and sets up alternatives to point at them then this might just work!

(There are a bunch more reboot/halt commands which are in /usr/bin/... on RHEL, so we need to turn those into links as well:

alternatives --install /sbin/halt halt              /lib/molly-guard/molly-guard 999 \
  --slave /sbin/powreoff          poweroff          /lib/molly-guard/molly-guard \
  --slave /sbin/reboot            reboot            /lib/molly-guard/molly-guard \
  --slave /sbin/shutdown          shutdown          /lib/molly-guard/molly-guard \
  --slave /sbin/coldreboot        coldreboot        /lib/molly-guard/molly-guard \
  --slave /sbin/pm-hibernate      pm-hibernate      /lib/molly-guard/molly-guard \
  --slave /sbin/pm-suspend        pm-suspend        /lib/molly-guard/molly-guard \
  --slave /sbin/pm-suspend-hybrid pm-suspend-hybrid /lib/molly-guard/molly-guard \
  --slave /usr/bin/reboot         usrbinreboot      /lib/molly-guard/molly-guard \
  --slave /usr/bin/halt           usrbinhalt        /lib/molly-guard/molly-guard \
  --slave /usr/bin/poweroff       usrbinpoweroff    /lib/molly-guard/molly-guard

# Ensure we're using this by default:
alternatives --set halt /lib/molly-guard/molly-guard

At this point we’ve got a set of alternatives and a post-install RPM scriptlet which copies the required commands into the /lib/molly-guard directory, but there is still a problem that RPM will clobber these when the clashing packages are updated! So we definitely need to ensure that our alternatives get re-added after any updates. To do this we can do one of the following:

  • Have a cron job which runs regularly and enforces the alternatives setting for “halt” (the “slave” entries trigger all the others to update when this is done)
  • Use Puppet or some other configuration management to enforce the alternatives setting (will have a small window of the binaries being the new version until the alternatives are reset by Puppet on the next run)
  • Scrap the alternatives method and use something else like modifying the PATH to ensure our reboot/halt versions are before the system versions (will break if scripts use full paths to the commands, which is pretty common)
  • Drop this approach and create a PAM module which does the same as molly-guard, then use consolehelper and PAM instead (untested)

I’ve decided to go for the second of these and have configuration management ensure that the “alternatives” settings are enforced.

Will this have any side-effects?

One side-effect is that on RHEL6 the reboot-related commands in /usr/bin use “consolehelper” to control access to these commands through PAM. Without some additional jiggery-pokery this functionality will be broken by this update.

Updates to the packages will need to be handled somehow, perhaps by detecting that the binaries have been put back in place and updating the molly-guard-controlled versions. (Eek!)

I’ve not looked at updating the package for RHEL7, where the reboot and related commands all link to systemctl for systemd control. This is likely to need some different shenanigans to get it to work…

Where can I get the packages?

These are a work in progress but the updated git repoistory is available here. RPMs will be available from http://packages.bris.ac.uk/centos/6/zone_d/ (UoB internal only access) once finalised. It it works well these should be published more widely 🙂

No RHEL7 packages yet I’m afraid — I need to investigate how to get this to work with systemd!

(Bonus Q: Why is it called Molly-guard? See this definition for an explanation)

Rethinking the wireless database architecture

The eduroam wireless network has a reliance on a database for the authorization and accounting parts of AAA (authentication, authorization and accounting – are you who you say you are, what access are you allowed, and what did you do while connected).

When we started dabbling with database-backed AAA in 2007 or so, we used a centrally-provided Oracle database. The volume of AAA traffic was low and high performance was not necessary. However (spoiler alert) demand for wireless connectivity grew and before many months, we were placing more demand on Oracle than it could handle. The latency of our queries was taking sufficiently long that some wireless authentication requests would time out and fail.

First gen – MySQL (2007)

It was clear that we needed a dedicated database platform, and at the time that we asked, the DBAs were not able to provide a suitable platform. We went down the route of implementing our own. We decided to use MySQL as a low-complexity open source database server with a large community. The first iteration of the eduroam database hardware was a single second-hand server that was going spare. It had no resilience but was suitably snappy for our needs.

First gen database

First gen database

Second gen – MySQL MMM (2011)

Demand continued to grow but more crucially eduroam went from being a beta service that was “not to be relied upon” to being a core service that users routinely used for their teaching, learning and research. Clearly a cobbled-together solution was no longer fit for purpose, so we went about designing a new database platform.

The two key requirements were high query capacity, and high availability, i.e. resilience against the failure of an individual node. At the time, none of the open source database servers had proper clustering – only master-slave replication. We installed a clustering wrapper for MySQL, called MMM (MySQL Multi Master). This gave us a resilient two-node cluster whether either node could be queried for reads and one node was designated the “writer” at any one time. In the event of a node failure, the writer role would be automatically moved around by the supervisor.

Second gen database

Second gen database

Not only did this buy us resilience against hardware faults, for the first time it also allowed us to drop either node out of the cluster for patching and maintenance during the working day without affecting service for users.

The two-node MMM system served us well for several years, until the hardware came to its natural end of life. The size of the dataset had grown and exceeded the size of the servers’ memory (the 8GB that seemed generous in 2011 didn’t really go so far in 2015) meaning that some queries were quite slow. By this time, MMM had been discontinued so we set out to investigate other forms of clustering.

Third gen – MariaDB Galera (2015)

MySQL had been forked into MariaDB which was becoming the default open source database, replacing MySQL while retaining full compatibility. MariaDB came with an integrated clustering driver called Galera which was getting lots of attention online. Even the developer of MMM recommended using MariaDB Galera.

MariaDB Galera has no concept of “master” or “slave” – all the nodes are masters and are considered equal. Read and write queries can be sent to any of the nodes at will. For this reason, it is strongly recommended to have an odd number of nodes, so if a cluster has a conflict or goes split-brain, the nodes will vote on who is the “odd one out”. This node will then be forced to resync.

This approach lends itself naturally to load-balancing. After talking to Netcomms about the options, we placed all three MariaDB Galera nodes behind the F5 load balancer. This allows us to use one single IP address for the whole cluster, and the F5 will direct queries to the most appropriate backend node. We configured a probe so the F5 is aware of the state of the nodes, and will not direct queries to a node that is too busy, out of sync, or offline.

Third gen database

Third gen database

Having three nodes that can be simultaneously queried gives us an unprecedented capacity which allows us to easily meet the demands of eduroam AAA today, with plenty of spare capacity for tomorrow. We are receiving more queries per second than ever before (240 per second, and we are currently in the summer vacation!).

We are required to keep eduroam accounting data for between 3 and 6 months – this means a large dataset. While disk is cheap these days and you can store an awful lot of data, you also need a lot of memory to hold the dataset twice over, for UPDATE operations which require duplicating a table in memory, making changes, merging the two copies back and syncing to disk. The new MariaDB Galera nodes have 192GB memory each while the size of the dataset is about 30GB. That should keep us going… for now.

Rsync between two hosts using sudo and a password prompt

Using rsync normally is nice and straightforward. e.g.:

# rsync -av -e 'ssh' /some/local/path user@remote:/some/remote/path

This works fine and prompts for the ssh password to log into the remote machine if required.

But what if the remote end needs root (or a different user) rights to write into the destination directory? Just whack in an --rsync-path option to add sudo to the rsync command, right?:

# rsync -av -e 'ssh' --rsync-path='sudo rsync' /some/local/path user@remote:/some/remote/path
user@remote's password: 
sudo: no tty present and no askpass program specified
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(632) [sender=3.0.4]

Oh, that didn’t work — sudo couldn’t ask us for a password. Adding the -t option to the ssh used by rsync doesn’t work either, as it can’t allocate a tty:

# rsync -av -e 'ssh -t' --rsync-path='sudo rsync' /some/local/path user@remote:/some/remote/path
Pseudo-terminal will not be allocated because stdin is not a terminal.
user@remote's password:

So we need to tell sudo to ask for the password some other way. Fortunately there is a -A option for sudo which tells it to use an “askpass” program, but we also need to tell it what askpass program to use (and it’s not in the default path on most machines). We can find this with locate askpass on the remote machine:

[user@remote]$ locate askpass

We’ll use /usr/libexec/openssh/ssh-askpass as that should pick an appropriate version according to what is available to sudo. Again sudo won’t have a tty to ask for the password on, so how about we use an X11 askpass program and enable X forwarding for ssh:

# rsync -av -e 'ssh -X' --rsync-path='SUDO_ASKPASS=/usr/libexec/openssh/ssh-askpass sudo -A rsync' /some/local/path user@remote:/some/remote/path
user@remote's password: [type user password]
[Then a dialog pops up for the sudo password]

Hooray, this works! Bit of a faff but it could be scripted or made into a shell function to save having to remember it 🙂

Yes, you could enable remote root login, but it would definitely be preferable to avoid that.

SysAdmin Appreciation Day

Blimey, the “last Friday in July” has rolled round again quickly, and today marks International SysAdmin Appreciation Day.

So, I’d like to say a heartfelt thank you to all the sysadmins I interact with in IT Services!

Not just the linux/unix guys I work directly with, but the Windows team who provide me with useful things like authentication and RDP services, the network team who keep us talking to each other, the storage and backups team who provide me with somewhere to keep my data and anyone else who provides me with a service I consume!

You’re all doing great work which no one notices until it breaks, and then you unfailingly go above and beyond to restore service as soon as possible.

Thankyou my fellow sysadmins! I couldn’t do my job without you all doing yours.

RPM addsign fail on vendor-provided package (and a workaround)

We’ve been signing RPM packages in local repos for a while now, and this has been working nicely (see previous posts about rpm signing)… until today.

The Intel Fortran 2015 installer provides RPMs which are already signed by Intel and which install and work fine, so we push these out with Puppet from our local (private) repo. However, even though I’d signed them myself they were failing to verify the signature…

Original RPM before re-signing locally:

  => rpm --checksig intel-fcompxe-187-15.0-3.noarch.rpm
  intel-fcompxe-187-15.0-3.noarch.rpm: RSA sha1 ((MD5) PGP) md5 NOT OK (MISSING KEYS: (MD5) PGP#7a5a985f) 

  => rpm -qpi intel-fcompxe-187-15.0-3.noarch.rpm
  warning: intel-fcompxe-187-15.0-3.noarch.rpm: Header V3 RSA/SHA1 Signature, key ID 7a5a985f: NOKEY
  Name        : intel-fcompxe-187            
  Signature   : RSA/SHA1, Fri 10 Apr 2015 13:23:59 BST, Key ID 27fbcd8d7a5a985f

So I go ahead and sign the package:

  => rpm --addsign intel-fcompxe-187-15.0-3.noarch.rpm 
  Enter pass phrase: [type passphrase]
  Pass phrase is good.

All looks fine, so lets check the signature again:

  => rpm --checksig intel-fcompxe-187-15.0-3.noarch.rpm
  intel-fcompxe-187-15.0-3.noarch.rpm: RSA RSA sha1 sha1 ((MD5) PGP) ((MD5) PGP) md5 md5 NOT OK (MISSING KEYS: (MD5) PGP#7a5a985f (MD5) PGP#262a742e) 

  => rpm -qpi intel-fcompxe-187-15.0-3.noarch.rpm
warning: intel-fcompxe-187-15.0-3.noarch.rpm: Header V3 RSA/SHA1 Signature, key ID 7a5a985f: NOKEY
  Name        : intel-fcompxe-187            
  Signature   : RSA/SHA1, Fri 10 Apr 2015 13:23:59 BST, Key ID 27fbcd8d7a5a985f

Odd, it hasn’t changed… Let’s try removing the signature instead:

=> rpm --delsign intel-fcompxe-187-15.0-3.noarch.rpm 

=> rpm --checksig intel-fcompxe-187-15.0-3.noarch.rpm
intel-fcompxe-187-15.0-3.noarch.rpm: RSA RSA sha1 sha1 sha1 ((MD5) PGP) ((MD5) PGP) md5 md5 md5 NOT OK (MISSING KEYS: (MD5) PGP#7a5a985f (MD5) PGP#262a742e) 

That’s very odd, it’s added tags to the signature header. And if you try a few more times (just to be sure, right? :), it adds more tags to the header:

=> rpm --delsign intel-fcompxe-187-15.0-3.noarch.rpm 
Packager    : http://www.intel.com/software/products/support
Summary     : Intel(R) Fortran Compiler XE 15.0 Update 3 for Linux*
Description :
Intel(R) Fortran Compiler XE 15.0 Update 3 for Linux*

=> rpm --checksig intel-fcompxe-187-15.0-3.noarch.rpm
intel-fcompxe-187-15.0-3.noarch.rpm: RSA RSA sha1 sha1 sha1 sha1 sha1 sha1 ((MD5) PGP) ((MD5) PGP) md5 md5 md5 md5 md5 md5 NOT OK (MISSING KEYS: (MD5) PGP#7a5a985f (MD5) PGP#262a742e) 

If you do this a few more times then rpm can’t read the package at all anymore!

  => rpm --checksig intel-fcompxe-187-15.0-3.noarch.rpm
  error: intel-fcompxe-187-15.0-3.noarch.rpm: rpmReadSignature failed: sigh tags: BAD, no. of tags(33) out of range

  => rpm -ql -v -p intel-fcompxe-187-15.0-3.noarch.rpm 
  error: intel-fcompxe-187-15.0-3.noarch.rpm: rpmReadSignature failed: sigh tags: BAD, no. of tags(33) out of range
  error: intel-fcompxe-187-15.0-3.noarch.rpm: not an rpm package (or package manifest)


Workaround: Add the vendor keys as you should do, rather than re-signing. Appending the public key to the required RPM-GPG_KEY-* file is all that’s required, and then you can install the packages just fine.

Future work: Submit bug report about this to the rpm-sign developers…

How old is this solaris box?

Sometimes it’s useful to know how old a solaris server is, without having to dig out its serial number or documentation.

Turns out it’s really easy. “prtfru -c” will give you the build date of various bits of hardware in the system. For example, here’s a server that we’ve just retired (Which is long overdue!)

oldserver:$ sudo prtfru -c | grep UNIX_Timestamp
      /ManR/UNIX_Timestamp32: Mon Aug 22 02:52:32 BST 2005
      /ManR/UNIX_Timestamp32: Fri Jun  3 19:48:16 BST 2005
      /ManR/UNIX_Timestamp32: Wed Aug  3 11:39:47 BST 2005
      /ManR/UNIX_Timestamp32: Fri Jun  3 19:46:50 BST 2005

Capacity Planning for DNS

I’ve spent the last 6 months working on our DNS infrastructure, wrangling it into a more modern shape.

This is the first in a series of articles talking about some of the process we’ve been through and outlining some of the improvements we’ve made.

One of the exercises we try to go through when designing any new production infrastructure is capacity planning. There are four questions you need to be able to ask when you’re doing this:

  1. How much traffic do we need to handle today?
  2. How are we expecting traffic to grow?
  3. How much traffic can the infrastructure handle?
  4. How much headroom have we got?

We aim to be in a position where we can ask those four questions on a regular basis, and preferably get useful answers to them!

When it comes to DNS, the most useful metric would appear to be “queries/second” (which I’ll refer to as qps from here on in to save a load of typing!) and bind can give us that information fairly readily with it’s built in statistics gathering features.

With that in mind, lets look at those 4 questions.

1. How much traffic do we need to handle today?
The best way to get hold of that information is to collect the qps metrics from our DNS infrastructure and graph them.

This is quite a popular thing to do and most monitoring tools (eg nagios, munin or ganglia) have well worn solutions available, and for everything else there’s google

Unfortunately we weren’t able to collate these stats from the core of the legacy DNS infrastructure in a meaningful way (due to differences in bind version, lack of a sensible aggregation point etc)

Instead, we had to infer it from other sources that we can/do monitor, for example the caching resolvers we use for eduroam.

Our eduroam wireless network is used by over 30,000 client devices a week. We think this around 60% of the total devices on the network, so it’s a fairly good proxy for the whole university network.

We looked at what the eduroam resolvers were handling at peak time (revision season), doubled it and added a bit. Not a particularly scientific approach, but it’s likely to be over-generous which is no bad thing in this case!

That gave us a ballpark figure of “we need to handle around 4000qps”

2. How are we expecting traffic to grow?
We don’t really have long term trend information for the central DNS service due to the historical lack of monitoring.

Again inferring generalities from eduroam, the number of clients on the network goes up by 20-30% year on year (and has done since 2011) Taking 30% growth year on year as our growth rate, and expanding that over 5 years it looks like this:

dns growth

Or in 5 years time we think we’ll need around 15,000qps.

All estimates in this process being on the generous side, and due to the compound nature of the year-on-year growth calculations, that should be a significant overestimate.

It will certainly be an interesting figure to revisit in 5 years time!

3. How much traffic can the infrastructure handle?
To answer this one, we need some benchmarking tools. After a bit of research I settled on dnsperf. The mechanics of how to run dnsperf (and how to gather a realistic sample dataset) are best left for another time.

All tests were done against the pre-production infrastructure so as not to interfere with live traffic.

Lets look at the graphs we get out at the end.

The new infrastructure:

Interpreting this graph isn’t immediately obvious. The way dnsperf works is that it linearly scales the number of queries/second that it’s sending to your DNS server, and tracks how many responses it gets back per second.

So the red line is how many queries/second we’re testing against, and the green line is how the server is responding. Where the two lines diverge shows you where your infrastructure starts to struggle.

In this case, the new infrastructure appears to cope quite well with around 30,000qps – or about twice what we’re expecting to need in 5 years time. That’s with all (or rather, both!) the servers in the pool available, so do we still have n+1 redundancy?

A single node in the new infrastructure:

From this graph you can see we’re good up to around 14000qps, so we’re n+1 redundant for at least the next 3-4 years (the lifetime of the harware we’re using)

At present, we have 2 nodes in the pool, the implication from the two graphs is that it does indeed scale approximately linearly with the number of servers in the pool.

4. How much headroom have we got?
At this point, the answer to that looks like “plenty” and with the new infrastructure we should be able to scale out almost linearly by adding more servers to the pool.

Now that we know how much we can expect our infrastructure to handle, and how much it’s actually experiencing, we can make informed decisions about when we need add more resources in order to maintain at least n+1 redundancy.

What about the legacy infrastructure?
Well, the reason I’m writing this post today (rather than any other day) is that we retired the oldest of the servers in the legacy infrastructure today, and I wanted to fire dnsperf at it, after it’s stopped handling live traffic but before we switch it off completely!

So how many queries/second can a 2005 vintage Sun Fire V240 server cope with?


It seems the answer to that is “not really enough for 2015!”

No wonder its response times were atrocious…

F5 Big-IP, Apache logs and client IP

Due to the fact that the F5 Big-IP make use of SNAT to load balance traffic, your back-end node will see the traffic coming from the IP of the load balancer and not the true client.
To overcome this (for web traffic at least), the F5 injects the X-Forwarded-For header in to HTTP steams with the true clients IP.

In apache you may want to log this IP instead of the remote host if it has been set. Using the SetEnvIf, we can produce a suitable LogFormat line based on if the X-Forwarded-For header is set or not:

CustomLog "/path/o/log/dir/example.com_access.log" combined env=!forwarded
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\"" proxy
SetEnvIf X-Forwarded-For "^.*\..*\..*\..*" forwarded
CustomLog "/path/to/log/dir/example.com_access.log" proxy env=forwarded

The above assumes that the “combined” LogFormat has already been defined.

If you use the ::apache::vhost puppet class from the puppetlabs/apache module, you can achieve the same result with the following parameters:

::apache::vhost { 'example.com':
  logroot => "/path/to/log/dir/",
  access_log_env_var => "!forwarded",
  custom_fragment => "LogFormat \"%{X-Forwarded-For}i %l %u %t \\\"%r\\\" %s %b \\\"%{Referer}i\\\" \\\"%{User-Agent}i\\\"\"    proxy
  SetEnvIf X-Forwarded-For \"^.*\..*\..*\..*\" forwarded
  CustomLog \"/path/to/log/dir/${title}_access.log\" proxy env=forwarded"