Automatic patching

Services being broken by automatically installing bad updates from the package manager is an issue that sysadmins have been mulling over for years. The knee-jerk reaction is to disable automatic updates, but this doesn’t avoid the problem. You still have to apply the updates sooner or later, and doing updates manually is ever more tedious as admins look after larger and larger fleets of virtual machines. Not patching at all is also not an option, for obvious security reasons and also for compliance with ISP-11 which mandates that security updates must be applied within 5 days.

The “gold standard” solution that everyone dreams of is to use a gated repo – i.e. you have a local “dirty” repo which syncs from upstream every night, and all of your dev/test machines update against this repo. Once “someone” has done “some” testing, they can allow the known-good updates into the clean repo, which the production machines update from. The trouble is, this is actually quite a lot of work to set up, and requires ongoing maintenance to test updates. It’s still very human-intensive, and sysadmins hate that.

So what else can be done? People often talk about a system where your dev servers patch nightly and your prod servers patch weekly, but this doesn’t help you if the broken update comes out on prod patching day.

We’ve recently started using a similar regime where dev/test servers patch nightly, but there’s a tweak with prod. All our prod servers are split into a class (dev, test, prod, etc) and one of three groups according to the numbers in their structured hostname. We classify nodes using a Puppet module called uob_classifier. Here’s an example:

[ispms@db-mariadb-p0 ~]$ sudo facter -p uob_service_class
prod
[ispms@db-mariadb-p0 ~]$ sudo facter -p uob_service_tier
1

Anything that for some reason slips through the automatic classification isn’t forgotten – the classifier assumes it is production group 1.

We then use these classes and groups to decide when to patch each type of box. Here’s the Puppet profile we use to manage yum updates with a Forge module, jgazeley/yumupdate.

# Profile for yum updates
class profile::yumupdate {

  # If it's prod, work out what day we should patch
  # to keep clusters patched on different days
  if ($::uob_service_class == 'prod') {
    $prodday = $::uob_service_tier ? {
      '1'     => ['1'],
      '2'     => ['2'],
      '3'     => ['3'],
      default => ['4'],
    }
  }

  # Work out what day(s) to patch for non-prod nodes
  $weekday = $::uob_service_class ? {
    'prod'  => $prodday,
    'dev'   => ['1-5'],
    'test'  => ['1-5'],
    'beta'  => ['1-5'],
    'demo'  => ['1-5'],
    default => ['1-5'],
  }

  # jgazeley/yumupdate
  class { '::yumupdate':
    weekday => $weekday,
  }
}

In our environment, production groups 1-3 patch on Mondays, Tuesdays or Wednesdays respectively. Anything that is designated production but somehow didn’t get a group number patches on Thursdays. Dev/test/etc servers patch every weekday.

We don’t know what day of the week patches will be released upstream, so this is still no guarantee that dev servers will install a particular patch before production. However it does mean that not all production servers will break on the same day if there is a bad update – we will have enough time to halt further updates if one server breaks due to a bad update.

It’s certainly not a perfect solution, but it avoids most of the risk surrounding automatic patching and is also very quick and simple to implement.