Rocks Clusters – the httpd update that breaks your cluster and how to fix it

I’ve had a cluster running Rocks 6.2 (Sidewinder) for a few months and it has been working well. I recently had a request to add a new user, so I created the account with a minimal useradd command specifying only the comment, the uid, the group and the username, then I ran the ‘rocks sync users’ command which copies various files, including /etc/passwd to the nodes and restarts some daemons.

A few hours later the user got back to me to say his jobs were queued, but not running. So I used the checkjob command to what the problem was, and found that his uid was unknown on the node. Indeed looking at the password file on the node, I saw that his account was not there. So I rebooted the node, and ran rocks sync users again, with no joy. So I set the node to rebuild on boot and rebooted it, and it came up with no user accounts at all.

There were errors like this in the log:

Jul 27 17:39:43 compute-0-8 411-alert-handler[13333]: Error: http://10.1.1.1:372/411.d/etc.auto..masterupdating Could not get file ‘http://10.1.1.1:372/411.d/etc.auto..master’: 400 Bad

The nodes get the password files amongst other things from the head node using the 411 service. So running the command below on the node should get all the files.

411get –all

however all I got was

Error: Could not get file ‘http://10.1.1.1:372/411.d//’: 400 Bad

I could ssh to a node and use wget to get the files successfully which caused me more confusion.

I had updated the head node recently, and this turned out to be my problem. I asked on the Rocks mailing list, and the answer I got was:

The latest CentOS 6 httpd update breaks 411.  To fix, add this to the
end of /etc/httpd/conf/httpd.conf and reload httpd:

HttpProtocolOptions Unsafe

So I did that, and now rocks sync users is working again. The version of http which caused the problem was httpd-2.2.15-60.el6.centos.4.x86_64

I’m putting this here in case anyone else gets hit by this.

mcollective with SSL — the short(ish) version

Prerequisites

  • A working puppet master and some clients to test with
  • Hiera in use, with some “common” section which is applied to all nodes. This isn’t strictly required, but it makes the set-up a lot simpler.

What you’ll end up with

  • SSL’d connections between all involved daemons
  • An admin user who can run “mco ping” and get responses from all the mcollective-enabled nodes

Basic setup (without SSL)

First you’ll want to get mcollective up and running *without* SSL to be sure that the module is working, before leaping on to use SSL and certificates.

Since this is a very basic setup all the examples with use the puppet master as the “broker”, where activemq queues all the requests and replies centrally, rather than a separate machine. Replace the puppet.example.org and node[12].example.org references with your host names as appropriate 🙂

  • Install the puppetlabs mcollective module and its dependencies with:
    puppet module install puppetlabs-mcollective
    This installs many other modules to help it along.
  • Add the mcollective class to your included class list for all nodes. e.g.:
    classes:
      ... (probably plenty of others)
      - mcollective
    
  • Add these variables to your hiera configs for all hosts:
    mcollective::middleware_hosts: puppet.example.org
    
  • Add these variables to apply only to your puppet master:
    mcollective::middleware: true
    mcollective::server    : true
    mcollective::client    : true
    
  • If you have firewalling on your test puppet master (and you should!), open port 61613. For me this required adding another hiera entry:
     site::firewall:
       '040 accept all stomp/activemq connections':
         state : ['NEW']
         proto : 'tcp'
         dport : 61613
         action: accept
    
  • Now run puppet (or wait for a run to occur automatically, if you have that set up) on the puppet master and a couple of nodes, and you should get the mcollective agent installed and configured on all nodes, plus the activemq daemon set up and running on the broker.

At this point you should be able to run mco ping and get some ping time results back from the nodes which are running the mcollective agent. You can also look at the output of netstat on the activemq server and see a number of ESTABLISHED connections to the 61613 port from the mcollective nodes.

Adding SSL support

Now you have a basic set-up working the steps required to get SSL working are :

  • Allow firewall connections to port 61614 on the broker (same host as the puppet master in this simple case)
  • Generate a strong random password for the mcollective user — I use pwgen -s -y 20 to generate good long random passwords
  • Generate a certificate (public/private) pair for the new admin user, using puppet cert generate admin-user-name on the puppet master
  • Generate a certificate (public/private) pair for the mcollective and activemq daemons to use, with puppet cert generate mcollective-servers
  • Create a site-local mcollective certificates set of directories under your module directories. For example, we have:
      /etc/puppet/modules/project_zoned/files/mcollective/
      /etc/puppet/modules/project_zoned/files/mcollective/client_certs
      /etc/puppet/modules/project_zoned/files/mcollective/certs
      /etc/puppet/modules/project_zoned/files/mcollective/private_keys
    
  • Copy the certificates to the appropriate places:
      cp /var/lib/puppet/ssl/certs/ca.pem /etc/puppet/modules/project_zoned/files/mcollective/certs/
      cp /var/lib/puppet/ssl/certs/mcollective-servers.pem /etc/puppet/modules/project_zoned/files/mcollective/certs/
      cp /var/lib/puppet/ssl/private_keys/mcollective-servers.pem /etc/puppet/modules/project_zoned/files/mcollective/private_keys/
      cp /var/lib/puppet/ssl/certs/admin-user-name.pem /etc/puppet/modules/project_zoned/files/mcollective/client_certs/
      cp /var/lib/puppet/ssl/private_keys/admin-user-name.pem /etc/puppet/modules/project_zoned/files/mcollective/private_keys/
      cp /var/lib/puppet/ssl/certs/admin-user-name.pem /etc/puppet/modules/project_zoned/files/mcollective/client_certs/
    
  • Add a declaration of an mcollective::user. We have various site-local classes which get auto-expanded into multiple calls to puppet defines, so we add a list like this:
    site::mcollective::users:
      - 'admin-user-name':
        group : 'admin-user-group'
        certificate: 'puppet:///modules/project_zoned/mcollective/client_certs/admin-user-name.pem'
        private_key: 'puppet:///modules/project_zoned/mcollective/private_keys/admin-user-name.pem'
    
  • Re-run puppet on the puppet master and you’ll have an install of mcollective which is set up to use SSL, and an admin user who has a ~/.mcollective config file, along with keys in their ~/.mcollective.d directory. This user should be able to run mco ping and get results from any nodes which have had the mcollective setup installed on them too (which is any that you run puppet on after adding the mcollective class and settings into a common yaml file for all nodes).

Now you should have a basic SSL-encrypted setup which works, and you can start adding mcollective plugins to do useful stuff like manage services and the puppet agent runs etc.

Adding more admin users just requires generating certificates for the user, copying them to the mcollective files repository in your “project” module, and adding them to the site::mcollective::users hash. Then they get the config added to their homedir automatically.

I can see this being rather useful as the number of machines we are managing continues to grow 😀

Docs which were useful / further reading

Moving a Rocks 6 cluster and changing the IP address of the head node

We have a Rocks 6 cluster which was definitely out-growing the server room/air-conditioned broom-cupboard in which it was housed, in terms of cooling, power consumption and a growing lack of physical space.
There were also many new nodes which wanted to be added but couldn’t without considerable upgrades to the room or a move to a more suitable space.

Moving it to a better home meant a change of upstream router and hence a change of IP address. All signs pointed to this being a terrible idea and a potential disaster, and all replies to related questions on the Rocks mailing list said that reinstalling the head node was the only viable solution. (Possibly using a “restore roll”)

We weren’t convinced that it would be less effort (or time taken) to reinstall than hack the config…

Internally others had moved IP and changed the hostname of a Rocks 5 cluster and had a long, long list of changes required. This was clearly going to be error-prone and looked like a worse idea than initially.

But then we found a page which claimed it was only a few “rocks set” commands and that’s all that was needed on Rocks 6!
( see http://davidmnoriega.wordpress.com/2012/06/22/rocks-cluster-changing-the-external-ip-address/ )

It did mention that there would be a couple of files in /etc to change too, but didn’t explicitly list everything.

Testing on VMs before the move…
We set about installing a VM on Virtualbox as a test head node, just with a default Rocks 6 config from the same install image which was used for the cluster (Rocks 6.0 “Mamba”). We then made a couple of compute node VMs for it to install and set about testing whether changing the IP using those instructions works fine.

First snag: VirtualBox Open-Source Edition doesn’t allow PXE boot from Intel ethernet adapters, because there is some non-GPL-compatible code which cannot be bundled in with it (ref: https://forums.virtualbox.org/viewtopic.php?f=9&t=34681 ).

Just pick a non-intel network adapter for the compute nodes and all is fine with the PXE boot, once the right boot order is set (PXE then HDD).

To ensure that we caught all references to the IP address, we used this:

find /etc -type f -print0 | xargs -0 grep a.b.c

..where a.b.c is the first three octets of the IP address as this is a /24 subnet we’re part of.
(This might be a bit more complicated if your subnet isn’t a /8, /16 or /24)

This finds:
/etc/sysconfig/network
/etc/sysconfig/network-scripts/ifcfg-em2
/etc/sysconfig/static-routes
/etc/yum.repos.d/Rocks-6.0.repo
/etc/hosts

Likewise we searched /opt as well as /etc, as the maui config and a couple of other bits and bobs are stored there; we found nothing which needed changing in /opt.

Then simply :

  • change all these IPs to the updated one (including changing static gateways)
  • run the few “rocks set” commands to update the ethernet interface and kickstart settings :


    rocks set host interface ip xxxxxxx ethX x.x.x.x
    rocks set attr Kickstart_PublicAddress x.x.x.x
    rocks set attr Kickstart_PublicNetwork x.x.x.x
    rocks set attr Kickstart_PublicBroadcast x.x.x.x
    rocks set attr Kickstart_PublicGateway x.x.x.x
    rocks set attr Kickstart_PublicNetmask x.x.x.x

  • physically move everything
  • (re)start the head node
  • rebuild all the nodes with:
    rocks set host boot compute action=install and (re)boot the nodes.
    They need this to point at the newly-configured head node IP, as it uses the external IP for static routing. (No idea why it doesn’t use the internal IP, but that’s for another post…)

And everything worked rather nicely!
This was then repeated on the physical machines with renewed confidence that it would work just fine.

Remember to check that everything is working at this point — submit jobs to the queues, ensure NFS is happy and that the nodes can talk to the required external services such as licence servers.

Problems encountered and glossed over by this:

  • Remember to make the configuration changes before you move the head node, otherwise it can take an age to boot whilst it times out on
    remote NFS mounts and other network operations
  • Ensure you have all sockets enabled (network and power) for the racks you are moving to, otherwise this could add another unwanted delay
  • Physically moving machines requires a fair bit of downtime, a van and some careful moving to avoid damage to disks (especially the head node)
  • Networking in the destination room was not terribly structured, and required us to thread a long cable between the head node and internal
    switch, as the head node is in a UPS’d rack which is the opposite side of the (quite large) room to the compute nodes. This will be more of a pain if any compute nodes need to also be on UPS (e.g. they
    have additional storage).
  • Don’t try to be optimistic on the time it will take — users will be happier if you allocate 3 days and it takes 2 than if you say it will take 1 and it takes 2.
  • Don’t have just broken your arm before attempting to move a cluster — it makes moving nodes or doing cabling rather more awkward/impossible! (Unlikely to be a problem for most people 🙂