Rsync between two hosts using sudo and a password prompt

Using rsync normally is nice and straightforward. e.g.:

# rsync -av -e 'ssh' /some/local/path user@remote:/some/remote/path

This works fine and prompts for the ssh password to log into the remote machine if required.

But what if the remote end needs root (or a different user) rights to write into the destination directory? Just whack in an --rsync-path option to add sudo to the rsync command, right?:

# rsync -av -e 'ssh' --rsync-path='sudo rsync' /some/local/path user@remote:/some/remote/path
user@remote's password: 
sudo: no tty present and no askpass program specified
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(632) [sender=3.0.4]

Oh, that didn’t work — sudo couldn’t ask us for a password. Adding the -t option to the ssh used by rsync doesn’t work either, as it can’t allocate a tty:

# rsync -av -e 'ssh -t' --rsync-path='sudo rsync' /some/local/path user@remote:/some/remote/path
Pseudo-terminal will not be allocated because stdin is not a terminal.
user@remote's password:

So we need to tell sudo to ask for the password some other way. Fortunately there is a -A option for sudo which tells it to use an “askpass” program, but we also need to tell it what askpass program to use (and it’s not in the default path on most machines). We can find this with locate askpass on the remote machine:

[user@remote]$ locate askpass
/etc/profile.d/gnome-ssh-askpass.csh
/etc/profile.d/gnome-ssh-askpass.sh
/usr/libexec/openssh/gnome-ssh-askpass
/usr/libexec/openssh/ssh-askpass

We’ll use /usr/libexec/openssh/ssh-askpass as that should pick an appropriate version according to what is available to sudo. Again sudo won’t have a tty to ask for the password on, so how about we use an X11 askpass program and enable X forwarding for ssh:

# rsync -av -e 'ssh -X' --rsync-path='SUDO_ASKPASS=/usr/libexec/openssh/ssh-askpass sudo -A rsync' /some/local/path user@remote:/some/remote/path
user@remote's password: [type user password]
[Then a dialog pops up for the sudo password]

Hooray, this works! Bit of a faff but it could be scripted or made into a shell function to save having to remember it 🙂

Yes, you could enable remote root login, but it would definitely be preferable to avoid that.

RPM addsign fail on vendor-provided package (and a workaround)

We’ve been signing RPM packages in local repos for a while now, and this has been working nicely (see previous posts about rpm signing)… until today.

The Intel Fortran 2015 installer provides RPMs which are already signed by Intel and which install and work fine, so we push these out with Puppet from our local (private) repo. However, even though I’d signed them myself they were failing to verify the signature…

Original RPM before re-signing locally:

  => rpm --checksig intel-fcompxe-187-15.0-3.noarch.rpm
  intel-fcompxe-187-15.0-3.noarch.rpm: RSA sha1 ((MD5) PGP) md5 NOT OK (MISSING KEYS: (MD5) PGP#7a5a985f) 

  => rpm -qpi intel-fcompxe-187-15.0-3.noarch.rpm
  warning: intel-fcompxe-187-15.0-3.noarch.rpm: Header V3 RSA/SHA1 Signature, key ID 7a5a985f: NOKEY
  Name        : intel-fcompxe-187            
  ...
  Signature   : RSA/SHA1, Fri 10 Apr 2015 13:23:59 BST, Key ID 27fbcd8d7a5a985f

So I go ahead and sign the package:

  => rpm --addsign intel-fcompxe-187-15.0-3.noarch.rpm 
  Enter pass phrase: [type passphrase]
  Pass phrase is good.
  intel-fcompxe-187-15.0-3.noarch.rpm:

All looks fine, so lets check the signature again:

  => rpm --checksig intel-fcompxe-187-15.0-3.noarch.rpm
  intel-fcompxe-187-15.0-3.noarch.rpm: RSA RSA sha1 sha1 ((MD5) PGP) ((MD5) PGP) md5 md5 NOT OK (MISSING KEYS: (MD5) PGP#7a5a985f (MD5) PGP#262a742e) 

  => rpm -qpi intel-fcompxe-187-15.0-3.noarch.rpm
warning: intel-fcompxe-187-15.0-3.noarch.rpm: Header V3 RSA/SHA1 Signature, key ID 7a5a985f: NOKEY
  Name        : intel-fcompxe-187            
  ...
  Signature   : RSA/SHA1, Fri 10 Apr 2015 13:23:59 BST, Key ID 27fbcd8d7a5a985f

Odd, it hasn’t changed… Let’s try removing the signature instead:

=> rpm --delsign intel-fcompxe-187-15.0-3.noarch.rpm 
intel-fcompxe-187-15.0-3.noarch.rpm:

=> rpm --checksig intel-fcompxe-187-15.0-3.noarch.rpm
intel-fcompxe-187-15.0-3.noarch.rpm: RSA RSA sha1 sha1 sha1 ((MD5) PGP) ((MD5) PGP) md5 md5 md5 NOT OK (MISSING KEYS: (MD5) PGP#7a5a985f (MD5) PGP#262a742e) 

That’s very odd, it’s added tags to the signature header. And if you try a few more times (just to be sure, right? :), it adds more tags to the header:

=> rpm --delsign intel-fcompxe-187-15.0-3.noarch.rpm 
intel-fcompxe-187-15.0-3.noarch.rpm:
...
Packager    : http://www.intel.com/software/products/support
Summary     : Intel(R) Fortran Compiler XE 15.0 Update 3 for Linux*
Description :
Intel(R) Fortran Compiler XE 15.0 Update 3 for Linux*

=> rpm --checksig intel-fcompxe-187-15.0-3.noarch.rpm
intel-fcompxe-187-15.0-3.noarch.rpm: RSA RSA sha1 sha1 sha1 sha1 sha1 sha1 ((MD5) PGP) ((MD5) PGP) md5 md5 md5 md5 md5 md5 NOT OK (MISSING KEYS: (MD5) PGP#7a5a985f (MD5) PGP#262a742e) 

If you do this a few more times then rpm can’t read the package at all anymore!

  => rpm --checksig intel-fcompxe-187-15.0-3.noarch.rpm
  error: intel-fcompxe-187-15.0-3.noarch.rpm: rpmReadSignature failed: sigh tags: BAD, no. of tags(33) out of range

  => rpm -ql -v -p intel-fcompxe-187-15.0-3.noarch.rpm 
  error: intel-fcompxe-187-15.0-3.noarch.rpm: rpmReadSignature failed: sigh tags: BAD, no. of tags(33) out of range
  error: intel-fcompxe-187-15.0-3.noarch.rpm: not an rpm package (or package manifest)

Ooops!

Workaround: Add the vendor keys as you should do, rather than re-signing. Appending the public key to the required RPM-GPG_KEY-* file is all that’s required, and then you can install the packages just fine.

Future work: Submit bug report about this to the rpm-sign developers…

Linux VMs on Hyper-V – be sure to install and run hyperv-daemons!

A short post, just to say that if you are running Linux VMs on Hyper-V hypervisors you really should install and run the hyperv daemons.

On RHEL7-based distros this leads to :

yum install hyperv-daemons
systemctl enable hypervvssd
systemctl enable hypervkvpd

Ubuntu users should install the packages specified on this Microsoft TechNet page (I’ve not tested this myself, as I don’t yet have any Ubuntu VMs on Hyper-V).

Once you’ve done this a whole host of important features will work, including:

  • Live migration of VMs
  • IP injection (?)
  • Dynamic memory sizing
  • etc..

This will avoid any surprise reboots when hypervisor nodes are taken down for maintenance (which is what happened to me before I installed these..).
Obviously the complete failure of a hypervisor will still cause VM downtime,

I leave it as an exercise to the reader to use configuration management to add these to all their Hyper-V VMs automatically 🙂

Further reading : Best Practices for running Linux on Hyper-V

Puppet future parser — what to expect that you’ll have to update in your manifests…

The Puppet Future Parser is the new implementation of the manifest parser which will become the default in 4.0, so I thought I’d take a look to see what I’d need to update.

Also, there are some fancy new features like iteration and that you can use [1,2] array notation or {a=>b} hash notation anywhere that you’d previously used a variable containing an array or hash.

The iteration and lambda features are intended to replace create_resources calls, as they are more flexible and can loop round repeatedly to create individual definitions.

For example, here’s a dumb “sudo” profile which uses the each construct to iterate over an array:

class profiles::sudo {
  # This is a particularly dumb version of use of sudo, to allow any commands:
  $admin_users = hiera_array('admin_users')
  # Additional users with special sudo rights, but no ssh access (e.g. root):
  $sudo_users  = hiera_array('sudo_users')

  class { ::sudo: }

  $all_sudo_users = concat($sudo_users, $admin_users)

  # Create a resource for each entry in the array:
  each($all_sudo_users) |$u| {
    sudo::entry { $u:
      comment  => "Allow ${u} to run anything as any user",
      username => $u,
      host     => 'ALL',
      as_user  => 'ALL',
      as_group => 'ALL',
      nopasswd => false,
      cmd      => 'ALL',
    }
  }
}

Making this work with create_resources and trying to splice in the the username for each user in the list into a hash looked like it would be messy, requiring at least an additional layer of define — this method is much neater.

This makes it much easier to create data abstractions over existing modules — you can programmatically massage the data you read from your hiera files and call definitions using that data in a much more flexible way than when passing hashes to create_resources. This “glue” can be separated into your roles and profiles (which could be the subject of another post but are described well in this blog post), creating a layer which separates the use of the module from the data which drives that use nicely.

So this all sounds pretty great, but there are a few changes you’ll possibly encounter when switching to the future parser:

  • Similar to the switch from puppet master to puppet server, the future parser is somewhat more strict about data formats. e.g. I found that my hiera data definitely needed to be properly quoted when I started using puppet server, so entries like mode : 644 in a file hash wouldn’t give the number you were expecting… (needs mode : 0644 or mode : '644' to avoid conversion from octal to decimal…). The future parser extends this to being more strict in your manifests, so a similarly-incorrect file { ... mode => 644 } declaration needs quoting or a leading zero. If you use puppet-lint you’ll catch this anyway — so use it! 🙂
  • It’s necessary to use {} instead of undef when setting default values for hiera_hash (and likewise [] instead of undef for hiera_array), to allow conditional expressions of the form if $var { ... } to work as intended. It seems that in terms of falseness for arrays and hashes that undef is in fact true… (could be a bug, as this page in the docs says: “When used as a boolean, undef is false”)
  • Dynamically-scoped variables (which are pretty mad and difficult to follow anyway, which is why most languages avoid them like the plague…) don’t pass between a class and any sub-classes which it creates. This is in the docs here, but it’s such a common pattern that it could well have made it through from your old (pre-Puppet 2.7) manifests and still have been working OK until the switch to the future parser. e.g.:
    class foo {
      $var = "x"
    }
    
    class bar {
      include foo
      # $var isn't defined here, as dynamic scope rules don't allow it in Puppet >2.7
    }
    

    Instead you need to explicitly qualify your variables to pull them out of the correct scope — $foo::var in this case. In your erb templates, as a common place where the dynamically-scoped variables might have ended up getting used, you can now use scope['::foo::var'] as a shorthand for the previously-longer scope.lookupvar('::foo::var') to explicitly qualify the lookup of variables. The actual scope rules for Puppet < 2.7 are somewhat more complicated and often led to confusing situations if you unintentionally used dynamic scoping, especially when combined with overriding variables from the parent scope…

  • I’m not sure that expressions of the form if "foo" in $arrayvar { ... } work how they should, but I’ve not had a chance to investigate this properly yet.

Most of these are technically the parser more strictly adhering to the specifications, but it’s easy to have accidentally had them creep into your manifests if you’re not being good and using puppet-lint and other tools to check them.

In conclusion : Start using the Future Parser soon! It adds excellent features for iteration which make abstracting data a whole lot easier than using the non-future (past?) parser allows. Suddenly the combination of roles, profiles and the iteration facilities in the future parser mean that abstraction using Puppet and hiera makes an awful lot more sense!

Puppet — making array expansions in resource calls unique

Lets say that you have an array of NFS clients to allow access to an export. You want to expand this into a list of resources, which you’d normally do via something like:

$allow = hiera_array('some_variable', undef)
nfs::export { $allow:
  ...
}

This works fine until you want to have multiple exports to the same name/IP, at which point you have two nfs::export resources with the same name. But you need the expansion of the array to generate multiple calls to nfs::export, so you need to prefix each string in the array:

$allow = hiera_array('some_variable', undef)
$hacked_allow = prefix($allow, "${fqdn} export of ${path} to client ")
nfs::export { $hacked_allow:
  ...
}

..then in the called define you extract the client fqdn like so:

define nfs::export ( ... ) {
  $client = regsubst($name, "^.* export of .* to client (.*)$", '\1')
  validate_string($client)

  ... [use $client as fqdn in here]
}

This seems like a terrible hack, but it’s a reasonable workaround for the need to expand arrays and keep the names unique to avoid duplicate resources.

mcollective with SSL — the short(ish) version

Prerequisites

  • A working puppet master and some clients to test with
  • Hiera in use, with some “common” section which is applied to all nodes. This isn’t strictly required, but it makes the set-up a lot simpler.

What you’ll end up with

  • SSL’d connections between all involved daemons
  • An admin user who can run “mco ping” and get responses from all the mcollective-enabled nodes

Basic setup (without SSL)

First you’ll want to get mcollective up and running *without* SSL to be sure that the module is working, before leaping on to use SSL and certificates.

Since this is a very basic setup all the examples with use the puppet master as the “broker”, where activemq queues all the requests and replies centrally, rather than a separate machine. Replace the puppet.example.org and node[12].example.org references with your host names as appropriate 🙂

  • Install the puppetlabs mcollective module and its dependencies with:
    puppet module install puppetlabs-mcollective
    This installs many other modules to help it along.
  • Add the mcollective class to your included class list for all nodes. e.g.:
    classes:
      ... (probably plenty of others)
      - mcollective
    
  • Add these variables to your hiera configs for all hosts:
    mcollective::middleware_hosts: puppet.example.org
    
  • Add these variables to apply only to your puppet master:
    mcollective::middleware: true
    mcollective::server    : true
    mcollective::client    : true
    
  • If you have firewalling on your test puppet master (and you should!), open port 61613. For me this required adding another hiera entry:
     site::firewall:
       '040 accept all stomp/activemq connections':
         state : ['NEW']
         proto : 'tcp'
         dport : 61613
         action: accept
    
  • Now run puppet (or wait for a run to occur automatically, if you have that set up) on the puppet master and a couple of nodes, and you should get the mcollective agent installed and configured on all nodes, plus the activemq daemon set up and running on the broker.

At this point you should be able to run mco ping and get some ping time results back from the nodes which are running the mcollective agent. You can also look at the output of netstat on the activemq server and see a number of ESTABLISHED connections to the 61613 port from the mcollective nodes.

Adding SSL support

Now you have a basic set-up working the steps required to get SSL working are :

  • Allow firewall connections to port 61614 on the broker (same host as the puppet master in this simple case)
  • Generate a strong random password for the mcollective user — I use pwgen -s -y 20 to generate good long random passwords
  • Generate a certificate (public/private) pair for the new admin user, using puppet cert generate admin-user-name on the puppet master
  • Generate a certificate (public/private) pair for the mcollective and activemq daemons to use, with puppet cert generate mcollective-servers
  • Create a site-local mcollective certificates set of directories under your module directories. For example, we have:
      /etc/puppet/modules/project_zoned/files/mcollective/
      /etc/puppet/modules/project_zoned/files/mcollective/client_certs
      /etc/puppet/modules/project_zoned/files/mcollective/certs
      /etc/puppet/modules/project_zoned/files/mcollective/private_keys
    
  • Copy the certificates to the appropriate places:
      cp /var/lib/puppet/ssl/certs/ca.pem /etc/puppet/modules/project_zoned/files/mcollective/certs/
      cp /var/lib/puppet/ssl/certs/mcollective-servers.pem /etc/puppet/modules/project_zoned/files/mcollective/certs/
      cp /var/lib/puppet/ssl/private_keys/mcollective-servers.pem /etc/puppet/modules/project_zoned/files/mcollective/private_keys/
      cp /var/lib/puppet/ssl/certs/admin-user-name.pem /etc/puppet/modules/project_zoned/files/mcollective/client_certs/
      cp /var/lib/puppet/ssl/private_keys/admin-user-name.pem /etc/puppet/modules/project_zoned/files/mcollective/private_keys/
      cp /var/lib/puppet/ssl/certs/admin-user-name.pem /etc/puppet/modules/project_zoned/files/mcollective/client_certs/
    
  • Add a declaration of an mcollective::user. We have various site-local classes which get auto-expanded into multiple calls to puppet defines, so we add a list like this:
    site::mcollective::users:
      - 'admin-user-name':
        group : 'admin-user-group'
        certificate: 'puppet:///modules/project_zoned/mcollective/client_certs/admin-user-name.pem'
        private_key: 'puppet:///modules/project_zoned/mcollective/private_keys/admin-user-name.pem'
    
  • Re-run puppet on the puppet master and you’ll have an install of mcollective which is set up to use SSL, and an admin user who has a ~/.mcollective config file, along with keys in their ~/.mcollective.d directory. This user should be able to run mco ping and get results from any nodes which have had the mcollective setup installed on them too (which is any that you run puppet on after adding the mcollective class and settings into a common yaml file for all nodes).

Now you should have a basic SSL-encrypted setup which works, and you can start adding mcollective plugins to do useful stuff like manage services and the puppet agent runs etc.

Adding more admin users just requires generating certificates for the user, copying them to the mcollective files repository in your “project” module, and adding them to the site::mcollective::users hash. Then they get the config added to their homedir automatically.

I can see this being rather useful as the number of machines we are managing continues to grow 😀

Docs which were useful / further reading

Moving a Rocks 6 cluster and changing the IP address of the head node

We have a Rocks 6 cluster which was definitely out-growing the server room/air-conditioned broom-cupboard in which it was housed, in terms of cooling, power consumption and a growing lack of physical space.
There were also many new nodes which wanted to be added but couldn’t without considerable upgrades to the room or a move to a more suitable space.

Moving it to a better home meant a change of upstream router and hence a change of IP address. All signs pointed to this being a terrible idea and a potential disaster, and all replies to related questions on the Rocks mailing list said that reinstalling the head node was the only viable solution. (Possibly using a “restore roll”)

We weren’t convinced that it would be less effort (or time taken) to reinstall than hack the config…

Internally others had moved IP and changed the hostname of a Rocks 5 cluster and had a long, long list of changes required. This was clearly going to be error-prone and looked like a worse idea than initially.

But then we found a page which claimed it was only a few “rocks set” commands and that’s all that was needed on Rocks 6!
( see http://davidmnoriega.wordpress.com/2012/06/22/rocks-cluster-changing-the-external-ip-address/ )

It did mention that there would be a couple of files in /etc to change too, but didn’t explicitly list everything.

Testing on VMs before the move…
We set about installing a VM on Virtualbox as a test head node, just with a default Rocks 6 config from the same install image which was used for the cluster (Rocks 6.0 “Mamba”). We then made a couple of compute node VMs for it to install and set about testing whether changing the IP using those instructions works fine.

First snag: VirtualBox Open-Source Edition doesn’t allow PXE boot from Intel ethernet adapters, because there is some non-GPL-compatible code which cannot be bundled in with it (ref: https://forums.virtualbox.org/viewtopic.php?f=9&t=34681 ).

Just pick a non-intel network adapter for the compute nodes and all is fine with the PXE boot, once the right boot order is set (PXE then HDD).

To ensure that we caught all references to the IP address, we used this:

find /etc -type f -print0 | xargs -0 grep a.b.c

..where a.b.c is the first three octets of the IP address as this is a /24 subnet we’re part of.
(This might be a bit more complicated if your subnet isn’t a /8, /16 or /24)

This finds:
/etc/sysconfig/network
/etc/sysconfig/network-scripts/ifcfg-em2
/etc/sysconfig/static-routes
/etc/yum.repos.d/Rocks-6.0.repo
/etc/hosts

Likewise we searched /opt as well as /etc, as the maui config and a couple of other bits and bobs are stored there; we found nothing which needed changing in /opt.

Then simply :

  • change all these IPs to the updated one (including changing static gateways)
  • run the few “rocks set” commands to update the ethernet interface and kickstart settings :


    rocks set host interface ip xxxxxxx ethX x.x.x.x
    rocks set attr Kickstart_PublicAddress x.x.x.x
    rocks set attr Kickstart_PublicNetwork x.x.x.x
    rocks set attr Kickstart_PublicBroadcast x.x.x.x
    rocks set attr Kickstart_PublicGateway x.x.x.x
    rocks set attr Kickstart_PublicNetmask x.x.x.x

  • physically move everything
  • (re)start the head node
  • rebuild all the nodes with:
    rocks set host boot compute action=install and (re)boot the nodes.
    They need this to point at the newly-configured head node IP, as it uses the external IP for static routing. (No idea why it doesn’t use the internal IP, but that’s for another post…)

And everything worked rather nicely!
This was then repeated on the physical machines with renewed confidence that it would work just fine.

Remember to check that everything is working at this point — submit jobs to the queues, ensure NFS is happy and that the nodes can talk to the required external services such as licence servers.

Problems encountered and glossed over by this:

  • Remember to make the configuration changes before you move the head node, otherwise it can take an age to boot whilst it times out on
    remote NFS mounts and other network operations
  • Ensure you have all sockets enabled (network and power) for the racks you are moving to, otherwise this could add another unwanted delay
  • Physically moving machines requires a fair bit of downtime, a van and some careful moving to avoid damage to disks (especially the head node)
  • Networking in the destination room was not terribly structured, and required us to thread a long cable between the head node and internal
    switch, as the head node is in a UPS’d rack which is the opposite side of the (quite large) room to the compute nodes. This will be more of a pain if any compute nodes need to also be on UPS (e.g. they
    have additional storage).
  • Don’t try to be optimistic on the time it will take — users will be happier if you allocate 3 days and it takes 2 than if you say it will take 1 and it takes 2.
  • Don’t have just broken your arm before attempting to move a cluster — it makes moving nodes or doing cabling rather more awkward/impossible! (Unlikely to be a problem for most people 🙂