RPM addsign fail on vendor-provided package (and a workaround)

We’ve been signing RPM packages in local repos for a while now, and this has been working nicely (see previous posts about rpm signing)… until today.

The Intel Fortran 2015 installer provides RPMs which are already signed by Intel and which install and work fine, so we push these out with Puppet from our local (private) repo. However, even though I’d signed them myself they were failing to verify the signature…

Original RPM before re-signing locally:

  => rpm --checksig intel-fcompxe-187-15.0-3.noarch.rpm
  intel-fcompxe-187-15.0-3.noarch.rpm: RSA sha1 ((MD5) PGP) md5 NOT OK (MISSING KEYS: (MD5) PGP#7a5a985f) 

  => rpm -qpi intel-fcompxe-187-15.0-3.noarch.rpm
  warning: intel-fcompxe-187-15.0-3.noarch.rpm: Header V3 RSA/SHA1 Signature, key ID 7a5a985f: NOKEY
  Name        : intel-fcompxe-187            
  ...
  Signature   : RSA/SHA1, Fri 10 Apr 2015 13:23:59 BST, Key ID 27fbcd8d7a5a985f

So I go ahead and sign the package:

  => rpm --addsign intel-fcompxe-187-15.0-3.noarch.rpm 
  Enter pass phrase: [type passphrase]
  Pass phrase is good.
  intel-fcompxe-187-15.0-3.noarch.rpm:

All looks fine, so lets check the signature again:

  => rpm --checksig intel-fcompxe-187-15.0-3.noarch.rpm
  intel-fcompxe-187-15.0-3.noarch.rpm: RSA RSA sha1 sha1 ((MD5) PGP) ((MD5) PGP) md5 md5 NOT OK (MISSING KEYS: (MD5) PGP#7a5a985f (MD5) PGP#262a742e) 

  => rpm -qpi intel-fcompxe-187-15.0-3.noarch.rpm
warning: intel-fcompxe-187-15.0-3.noarch.rpm: Header V3 RSA/SHA1 Signature, key ID 7a5a985f: NOKEY
  Name        : intel-fcompxe-187            
  ...
  Signature   : RSA/SHA1, Fri 10 Apr 2015 13:23:59 BST, Key ID 27fbcd8d7a5a985f

Odd, it hasn’t changed… Let’s try removing the signature instead:

=> rpm --delsign intel-fcompxe-187-15.0-3.noarch.rpm 
intel-fcompxe-187-15.0-3.noarch.rpm:

=> rpm --checksig intel-fcompxe-187-15.0-3.noarch.rpm
intel-fcompxe-187-15.0-3.noarch.rpm: RSA RSA sha1 sha1 sha1 ((MD5) PGP) ((MD5) PGP) md5 md5 md5 NOT OK (MISSING KEYS: (MD5) PGP#7a5a985f (MD5) PGP#262a742e) 

That’s very odd, it’s added tags to the signature header. And if you try a few more times (just to be sure, right? :), it adds more tags to the header:

=> rpm --delsign intel-fcompxe-187-15.0-3.noarch.rpm 
intel-fcompxe-187-15.0-3.noarch.rpm:
...
Packager    : http://www.intel.com/software/products/support
Summary     : Intel(R) Fortran Compiler XE 15.0 Update 3 for Linux*
Description :
Intel(R) Fortran Compiler XE 15.0 Update 3 for Linux*

=> rpm --checksig intel-fcompxe-187-15.0-3.noarch.rpm
intel-fcompxe-187-15.0-3.noarch.rpm: RSA RSA sha1 sha1 sha1 sha1 sha1 sha1 ((MD5) PGP) ((MD5) PGP) md5 md5 md5 md5 md5 md5 NOT OK (MISSING KEYS: (MD5) PGP#7a5a985f (MD5) PGP#262a742e) 

If you do this a few more times then rpm can’t read the package at all anymore!

  => rpm --checksig intel-fcompxe-187-15.0-3.noarch.rpm
  error: intel-fcompxe-187-15.0-3.noarch.rpm: rpmReadSignature failed: sigh tags: BAD, no. of tags(33) out of range

  => rpm -ql -v -p intel-fcompxe-187-15.0-3.noarch.rpm 
  error: intel-fcompxe-187-15.0-3.noarch.rpm: rpmReadSignature failed: sigh tags: BAD, no. of tags(33) out of range
  error: intel-fcompxe-187-15.0-3.noarch.rpm: not an rpm package (or package manifest)

Ooops!

Workaround: Add the vendor keys as you should do, rather than re-signing. Appending the public key to the required RPM-GPG_KEY-* file is all that’s required, and then you can install the packages just fine.

Future work: Submit bug report about this to the rpm-sign developers…

How old is this solaris box?

Sometimes it’s useful to know how old a solaris server is, without having to dig out its serial number or documentation.

Turns out it’s really easy. “prtfru -c” will give you the build date of various bits of hardware in the system. For example, here’s a server that we’ve just retired (Which is long overdue!)

oldserver:$ sudo prtfru -c | grep UNIX_Timestamp
Password:
      /ManR/UNIX_Timestamp32: Mon Aug 22 02:52:32 BST 2005
      /ManR/UNIX_Timestamp32: Fri Jun  3 19:48:16 BST 2005
      /ManR/UNIX_Timestamp32: Wed Aug  3 11:39:47 BST 2005
      /ManR/UNIX_Timestamp32: Fri Jun  3 19:46:50 BST 2005
oldserver:$ 

Capacity Planning for DNS

I’ve spent the last 6 months working on our DNS infrastructure, wrangling it into a more modern shape.

This is the first in a series of articles talking about some of the process we’ve been through and outlining some of the improvements we’ve made.

One of the exercises we try to go through when designing any new production infrastructure is capacity planning. There are four questions you need to be able to ask when you’re doing this:

  1. How much traffic do we need to handle today?
  2. How are we expecting traffic to grow?
  3. How much traffic can the infrastructure handle?
  4. How much headroom have we got?

We aim to be in a position where we can ask those four questions on a regular basis, and preferably get useful answers to them!

When it comes to DNS, the most useful metric would appear to be “queries/second” (which I’ll refer to as qps from here on in to save a load of typing!) and bind can give us that information fairly readily with it’s built in statistics gathering features.

With that in mind, lets look at those 4 questions.

1. How much traffic do we need to handle today?
The best way to get hold of that information is to collect the qps metrics from our DNS infrastructure and graph them.

This is quite a popular thing to do and most monitoring tools (eg nagios, munin or ganglia) have well worn solutions available, and for everything else there’s google

Unfortunately we weren’t able to collate these stats from the core of the legacy DNS infrastructure in a meaningful way (due to differences in bind version, lack of a sensible aggregation point etc)

Instead, we had to infer it from other sources that we can/do monitor, for example the caching resolvers we use for eduroam.

Our eduroam wireless network is used by over 30,000 client devices a week. We think this around 60% of the total devices on the network, so it’s a fairly good proxy for the whole university network.

We looked at what the eduroam resolvers were handling at peak time (revision season), doubled it and added a bit. Not a particularly scientific approach, but it’s likely to be over-generous which is no bad thing in this case!

That gave us a ballpark figure of “we need to handle around 4000qps”

2. How are we expecting traffic to grow?
We don’t really have long term trend information for the central DNS service due to the historical lack of monitoring.

Again inferring generalities from eduroam, the number of clients on the network goes up by 20-30% year on year (and has done since 2011) Taking 30% growth year on year as our growth rate, and expanding that over 5 years it looks like this:

dns growth

Or in 5 years time we think we’ll need around 15,000qps.

All estimates in this process being on the generous side, and due to the compound nature of the year-on-year growth calculations, that should be a significant overestimate.

It will certainly be an interesting figure to revisit in 5 years time!

3. How much traffic can the infrastructure handle?
To answer this one, we need some benchmarking tools. After a bit of research I settled on dnsperf. The mechanics of how to run dnsperf (and how to gather a realistic sample dataset) are best left for another time.

All tests were done against the pre-production infrastructure so as not to interfere with live traffic.

Lets look at the graphs we get out at the end.

The new infrastructure:
20150624-1225.rate

Interpreting this graph isn’t immediately obvious. The way dnsperf works is that it linearly scales the number of queries/second that it’s sending to your DNS server, and tracks how many responses it gets back per second.

So the red line is how many queries/second we’re testing against, and the green line is how the server is responding. Where the two lines diverge shows you where your infrastructure starts to struggle.

In this case, the new infrastructure appears to cope quite well with around 30,000qps – or about twice what we’re expecting to need in 5 years time. That’s with all (or rather, both!) the servers in the pool available, so do we still have n+1 redundancy?

A single node in the new infrastructure:
20150622-1438.rate

From this graph you can see we’re good up to around 14000qps, so we’re n+1 redundant for at least the next 3-4 years (the lifetime of the harware we’re using)

At present, we have 2 nodes in the pool, the implication from the two graphs is that it does indeed scale approximately linearly with the number of servers in the pool.

4. How much headroom have we got?
At this point, the answer to that looks like “plenty” and with the new infrastructure we should be able to scale out almost linearly by adding more servers to the pool.

Now that we know how much we can expect our infrastructure to handle, and how much it’s actually experiencing, we can make informed decisions about when we need add more resources in order to maintain at least n+1 redundancy.

What about the legacy infrastructure?
Well, the reason I’m writing this post today (rather than any other day) is that we retired the oldest of the servers in the legacy infrastructure today, and I wanted to fire dnsperf at it, after it’s stopped handling live traffic but before we switch it off completely!

So how many queries/second can a 2005 vintage Sun Fire V240 server cope with?

20150727-0938.rate

It seems the answer to that is “not really enough for 2015!”

No wonder its response times were atrocious…

F5 Big-IP, Apache logs and client IP

Due to the fact that the F5 Big-IP make use of SNAT to load balance traffic, your back-end node will see the traffic coming from the IP of the load balancer and not the true client.
To overcome this (for web traffic at least), the F5 injects the X-Forwarded-For header in to HTTP steams with the true clients IP.

In apache you may want to log this IP instead of the remote host if it has been set. Using the SetEnvIf, we can produce a suitable LogFormat line based on if the X-Forwarded-For header is set or not:

CustomLog "/path/o/log/dir/example.com_access.log" combined env=!forwarded
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\"" proxy
SetEnvIf X-Forwarded-For "^.*\..*\..*\..*" forwarded
CustomLog "/path/to/log/dir/example.com_access.log" proxy env=forwarded

The above assumes that the “combined” LogFormat has already been defined.

If you use the ::apache::vhost puppet class from the puppetlabs/apache module, you can achieve the same result with the following parameters:

::apache::vhost { 'example.com':
  logroot => "/path/to/log/dir/",
  access_log_env_var => "!forwarded",
  custom_fragment => "LogFormat \"%{X-Forwarded-For}i %l %u %t \\\"%r\\\" %s %b \\\"%{Referer}i\\\" \\\"%{User-Agent}i\\\"\"    proxy
  SetEnvIf X-Forwarded-For \"^.*\..*\..*\..*\" forwarded
  CustomLog \"/path/to/log/dir/${title}_access.log\" proxy env=forwarded"
}

SELinux quicktip

A while ago, Jonathan wrote a really useful post about how to use SELinux – it’s useful, and I tend to refer to it every time I need to build an SELinux policy to get something working.

However, yesterday I hit a wrinkle not covered in that post. I was working on a nagios plugin which didn’t work when run by nrpe. It worked from the command line, and worked via nrpe with SELinux disabled (which pointed the finger neatly at SELinux) but it didn’t leave any traces in the audit log, which makes building a policy difficult!

It seems that the default policies in CentOS include a list of “don’t audit” rules, which silently block some types of behaviour. The intention is to keep a lot of common noise out of the audit log, but that doesn’t help you much when you’re trying to build a policy!

Luckily you can turn that behaviour on and off.

# Turn it off:
sudo semodule --disable_dontaudit --build
sudo setenforce 0

# Turn it back on:
sudo semodule --build
sudo setenforce 1

With dontaudit disabled, I got the information I needed in the audit log and was able to successfully build a policy that made my nagios check work.

What’s in your history?

A little bit of Friday frivolity for you. A friend of mine recently discovered zsh_history, which tells you what commands you run most often from your shell. Obviously zsh_history is pretty zsh specific, but a bit of rummaging in the code shows it pretty much does this:

history | awk '{CMD[$2]++;count++;}END { for (a in CMD)print CMD[a] " " CMD[a]/count*100 "% " a;}' | grep -v "./" | column -c3 -s " " -t | sort -nr | nl |  head -n10

In my case, on my work desktop my top 10 commands are:

     1	321  32.1%  git
     2	214  21.4%  ls
     3	151  15.1%  cd
     4	105  10.5%  vi
     5	29   2.9%   ssh
     6	29   2.9%   exit
     7	21   2.1%   grep
     8	14   1.4%   less
     9	13   1.3%   nslookup
    10	11   1.1%   whois

Given that git is a key component of the Team ResNet puppet workflow, it’s probably not surprising that it’s top of my list.

If you want to join in, hit us up in the comments and tell us what’s top of your list? Are there any typos which show up more often than you were expecting?

Integrating Puppet and Git

Introduction

One of the main benefits of working with Git is the ability to split code into branches and work on those branches in parallel. By integrating your Git branches into your Puppet workflow, you can present each Git branch as a Puppet environment to separate your dev, test and prod environments in different code branches.

This guide assumes you already have a working Puppet master which uses directory environments, and a working Gitlab install. It assumes that your Puppet and Gitlab instances are on different servers. It’s recommended that you install Gitlab with Puppet as I described in a previous blog post.

Overview

The basic principle is to sync all Puppet code from Gitlab to Puppet, but it’s a little more complex than this. When code is pushed to any branch in Git, a post-receive hook is executed on the Gitlab server. This connects via ssh to the Puppet master and executes another script which checks out the code from the specific Git branch into a directory environment of the same name.

Deploy keys

First you need to create a deploy key for your Puppet Git repo. This grants the Puppet master read-only access to your code. Follow these instructions to create a deploy key. You’ll need to put the deploy key in the home directory of the puppet user on your Puppet master, which is usually /var/lib/puppet/.ssh/id_rsa.pub.

Puppet-sync

The puppet-sync script is the component on the Puppet master. This is available from Github and can be installed verbatim into any path on your system. I chose /usr/local/bin.

SSH keys

You’ll need to create a pair of SSH public/private keys to allow your Gitlab server to ssh to your Puppet server. This is a pretty standard thing to do, but instructions are available on the README for puppet-sync.

Post-receive hook

The post-receive hook must be installed on the Gitlab server. The source code is available from Github. You will need to modify the hook to point at the right locations. The following values are sensible defaults.

REPO="git@gitlab.yoursite.com:namespace/puppet.git"
DEPLOY="/etc/puppet/environments"
SSH_ARGS="-i /var/opt/gitlab/.ssh/git_id_rsa"
PUPPETMASTER="puppet@puppet.yoursite.com"
SYNC_COMMAND="/usr/local/bin/puppet-sync"

If you are using the spuder/gitlab Puppet module to manage your Gitlab instance, installing a custom hook is as simple as doing:

gitlab::custom_hook { 'puppet-post-receive':
  namespace => 'namespace',
  project   => 'puppet',
  type      => 'post-receive',
  source    => 'puppet:///modules/site_gitlab/puppet-post-receive',
}

If you installed Gitlab by hand, you’ll need to do manually install the hook into the hooks subdirectory in your raw Git repo. On our Gitlab installation, this is in /var/opt/gitlab/git-data/repositories/group/puppet.git/hooks

Workflow

Now all the components are in place, you can start to use it (when you’ve tested it). Gitlab’s default branch is called master whereas Puppet’s default environment is production. You need to create a new branch in Gitlab called production, and set this to be the default branch. Delete master.

  1. Make new feature branches on the Gitlab server using the UI
  2. Use git pull and git checkout to fetch these branches onto your working copy
  3. Make changes, commit them, and when you push them, those changes are synced to a Puppet environment with the same name as the branch
  4. Move a dev VM into the new environment by doing puppet agent -t --environment newfeature
  5. When you’re happy, send a merge request to merge your branch back into production. When the merge request is accepted, the code will automatically be pushed to the Puppet server.

Read about this in more detail: Gitlab Flow.

Linux VMs on Hyper-V – be sure to install and run hyperv-daemons!

A short post, just to say that if you are running Linux VMs on Hyper-V hypervisors you really should install and run the hyperv daemons.

On RHEL7-based distros this leads to :

yum install hyperv-daemons
systemctl enable hypervvssd
systemctl enable hypervkvpd

Ubuntu users should install the packages specified on this Microsoft TechNet page (I’ve not tested this myself, as I don’t yet have any Ubuntu VMs on Hyper-V).

Once you’ve done this a whole host of important features will work, including:

  • Live migration of VMs
  • IP injection (?)
  • Dynamic memory sizing
  • etc..

This will avoid any surprise reboots when hypervisor nodes are taken down for maintenance (which is what happened to me before I installed these..).
Obviously the complete failure of a hypervisor will still cause VM downtime,

I leave it as an exercise to the reader to use configuration management to add these to all their Hyper-V VMs automatically 🙂

Further reading : Best Practices for running Linux on Hyper-V

soc::puppet – A puppet themed social event for UoB (Thursday 19th March)

What: soc::puppet is a puppet themed meet up for University of Bristol Staff using, or interested in puppet configuration management  (rather than actual marionettes or glove puppets)
Where: Brambles in The Hawthorns (see the link for details)
When: 5pm-7pm(ish) Thursday 19th March 2015

There’s a growing community of people around the University of Bristol using (or interested in using) puppet configuration management http://puppetlabs.com Some of those people are talking to eachother, and some just don’t know who to talk to yet!

Experience, use case and scale of implementation varies widely, but we’ve all got something to share! 🙂

With that in mind, there seems to be interest in an informal gathering of interested people, where we can get together, share ideas and build a local puppet community.  Bringing together all those informal corridor/tearoom chats and spreading the exciting ideas/knowledge around in a loose, informal manner.

As a first pass, we’ve booked “Brambles” which is the new name for the Staff Club space in The Hawthorns, for a couple of hours after work on Thursday 19th March.  If it goes well, it will hopefully turn into a regular event.

Our initial aim is to make it as informal as possible (hence doing it outside work hours, no pressure to minute it, assign actions, instigate formalised project teams etc) and treat it mostly as an exercise in putting people in touch with other people who are playing with similar toys.

That said, there are a few “bits of business” to take care of at the first meeting, so I’m suggesting the following as a vague agenda.

  1. Welcome!  What’s this about? (about 5 minutes)
  2. Introductions, very quick “round table” to introduce everyone, and say what level of exposure they’ve had to puppet so far (about 10 minutes)
  3. Everything beyond this point will be decided on the day.  If you’ve got something you’d like to talk about or present on, bring it with you!
  4. We’ll close the session with a very quick “should we do this again?” and “call for volunteers”

If people are interested, we can move on to a pub afterwards to continue the discussion.

The facilities available are a bit limited, and apparently the projector isn’t available at the moment, but we’ll see what direction it takes – and as they say in Open Space circles, “Whatever happens is the only thing that could have, be prepared to be surprised!”