Publish a Module on Puppet Forge

I’ve started publishing as many of my Puppet modules as possible on Puppet Forge. It isn’t hard to do but there are a few things to know. This guide is largely based on Puppetlabs’ own guide Publishing Modules on the Puppet Forge.

  1. For home-grown modules that have grown organically, you are likely to have at least some site-specific data mixed in with the code. Before publishing, you’ll need to abstract this out. I recommend using parametrised classes with sane defaults for your inputs. If necessary, you can have a local wrapper class to pass site-specific values into your module.
  2. The vast majority of Puppet modules are on GitHub, but this isn’t actually a requirement. GitHub offers public collaboration and issue tracking, but you can keep your code wherever you like.
  3. Before you can publish, you need to include some metadata with your module. Look at the output of puppet module generate. If you’re starting from scratch, this command is an excellent place to start. If you’re patching up an old module for publication, run it in a different location and selectively copy the useful files into your module. The mandatory files are metadata.json and README.md.
  4. When you’re ready to publish, run puppet module build. This creates a tarball of your module and metadata which is ready to upload to Puppet Forge.
  5. Create an account on Puppet Forge and upload your tarball. It will automatically fill in the metadata.
  6. Install your module on your Puppetmaster by doing puppet module install myname/mymodule

Building a Gitlab server with Puppet

GitHub is an excellent tool for code-sharing, but it has the major disadvantage of being fully public. You probably don’t want to put your confidential stuff and shared secrets in there! You can pay for private repositories, but the issue still stands that we shouldn’t be putting confidential UoB things in a non-approved cloud provider.

I briefly investigated several self-hosted pointy-clicky Git interfaces, including Gitorious, Gitolite, GitLab, Phabricator and Stash. They all have their relative merits but they all seem to be a total pain to install and run in a production environment, often requiring that we randomly git clone something into the webroot and then not providing a sane upgrade mechanism. Many of them have dependencies on modules not included with the enterprise Linux distributions

In the end, the easiest-to-deploy option seemed to be to use the GitLab Omnibus installer. This bundles the GitLab application with all its dependencies in a single RPM for ease of deployment. There’s also a Puppet Forge module called spuder/gitlab which makes it nice and easy to install on a Puppet-managed node.

After fiddling, my final solution invokes the Forge module like this:

class { 'gitlab' : 
  puppet_manage_config          => true,
  puppet_manage_backups         => true,
  puppet_manage_packages        => false,
  gitlab_branch                 => '7.4.3',
  external_url                  => "https://${::fqdn}",
  ssl_certificate               => '/etc/gitlab/ssl/gitlab.crt',
  ssl_certificate_key           => '/etc/gitlab/ssl/gitlab.key',
  redirect_http_to_https        => true,
  backup_keep_time              => 5184000, # 5184000 = 60 days
  gitlab_default_projects_limit => 100,
  gitlab_download_link          => 'https://downloads-packages.s3.amazonaws.com/centos-6.5/gitlab-7.4.3_omnibus.5.1.0.ci-1.el6.x86_64.rpm',
  gitlab_email_from             => 'gitlab@example.com',
  ldap_enabled                  => true,
  ldap_host                     => 'ldap.example.com',
  ldap_base                     => 'CN=Users,DC=example,DC=com',
  ldap_port                     => '636',
  ldap_uid                      => 'uid',
  ldap_method                   => 'ssl',
  ldap_bind_dn                  => 'uid=ldapuser,ou=system,dc=example,dc=com',
  ldap_password                 => '*********',
}

I also added a couple of resources to install the certificates and create a firewall exception, to make a complete working deployment.

The upgrade path requires manual intervention, but is mostly automatic. You just need to change gitlab_download_link to point to a newer RPM and change gitlab_branch to match.

If anyone is interested, I’d be happy to write something about the experience of using GitLab after a while, when I’ve found out some of the quirks.

Update by DaveG! (in lieu of comments currently on this site)

Gitlab have changed their install process to require use of their repo, so this module doesn’t like it very much. They’ve also changed the package name to ‘gitlab-ce’ rather than just ‘gitlab’.

To work around this I needed to:

  • Add name => 'gitlab-ce' to the package { 'gitlab': ... } params in gitlab/manifests/install.pp
  • Find the package RPM for a new shiny version of Gitlab. 7.11.4 in this case, via https://packages.gitlab.com/gitlab/gitlab-ce?filter=rpms
  • Copy the RPM to a local web-accessible location as a mirror, and use this as the location for the gitlab_download_link class parameter

This seems to have allowed it to work fine!
(Caveat: I had some strange behaviour with whether it would run the gitlab instance correctly, but I’m not sure if that’s because of left-overs from a previous install attempt. Needs more testing!)

DHCP fingerprinting

We wanted to find out what sort of devices are active on the wireless network, and the vendor tools we’ve got don’t quite give us the level of detail we were after.

However, everything which hits our wireless network gets a DHCP lease from our dhcp servers.  With a bit of dhcpd.conf magic, you can make it profile each client when it requests or renews a lease and record a fingerprint in the logs.

dhcpd.conf – collecting fingerprints

# put the dhcp options request fingerprint in the leases file
set dhcp-op-req-string = binary-to-ascii(10,8,":",option dhcp-parameter-request-list);

# log the fingerprint in the format:
# Jul 17 14:36:06 dhcp2 dhcpd: FINGERPRINT 1,3,6,12,15,28 for 00:10:20:30:40:50

log(info,
concat("FINGERPRINT ",
binary-to-ascii(10,8,",",option dhcp-parameter-request-list),
" for ",
concat (  # MAC
        suffix (concat ("0", binary-to-ascii (16, 8, "",
          substring (hardware, 1, 1))),2), ":",
        suffix (concat ("0", binary-to-ascii (16, 8, "",
          substring (hardware, 2, 1))),2), ":",
        suffix (concat ("0", binary-to-ascii (16, 8, "",
          substring (hardware, 3, 1))),2), ":",
        suffix (concat ("0", binary-to-ascii (16, 8, "",
          substring (hardware, 4, 1))),2), ":",
        suffix (concat ("0", binary-to-ascii (16, 8, "",
          substring (hardware, 5, 1))),2), ":",
        suffix (concat ("0", binary-to-ascii (16, 8, "",
          substring (hardware, 6, 1))),2)
       )        # End MAC
));
# End DHCP fingerprinting

Now every time a device interacts with our DHCP server, we get a FINGERPRINT line appearing in our logs along with the mac address which requested the lease.

So far, so good. Now we need to process those logs into something anonymous, but meaningful.

Data Prep
The easiest approach is to cat our logfile, strip out just the fields we’re interested in (mac address and fingerprint) then sort them to remove duplicates (we only want to count each machine once!) and then finally throw away the mac addresses (because all we really want are the fingerprints)

We can do that easily enough with a lovely long pipeline

cat /var/log/dhcpd.log | grep FINGERPRINT | awk '{ print $9 " " $7 }' | sort -u | awk '{ print $2 }'

There are probably more elegant ways to do it, but the above isn’t really the interesting bit. All you get out of it is a list of fingerprints. The magic is in converting those into something meaningful.

Chewing on your fingerprints
To process, identify and count these fingerprints, we need the help of the fingerbank project who have collected DHCP fingerprints from all over the place.

I’m grabbing the fingerprint list as a config file from their github repo: https://github.com/inverse-inc/fingerbank/blob/master/dhcp_fingerprints.conf although since I first started playing with this about 6 months ago, it seems they’ve made their fingerprint database available as an Sqlite DB – which would have been much easier to wrangle than parsing the config file.

So here’s a slightly shonky perl script to parse the config file and produce a CSV summary of the output. This is probably not as elegantly done as it could be, please don’t judge too harshly! I’ve tried to make it readable, but some of the datastructures are a bit on the deep side. If you want to see what’s going on, make plenty of use of “Data::Dumper” – I know I had to when writing it.

It assumes dhcp_fingerprints.conf is in the same folder as the script, and expects to be fed fingerprints over STDIN one line at a time – so you can stick it on the end of the pipeline I mentioned earlier.

#!/usr/bin/perl -wT

use strict;

use Config::IniFiles;
use Data::Dumper;

my %dhcp_fingerprints; # tied version of the config file
my ($fprint_db, $fprint_class, $os_counter); # DStructs which we query later

# Tie fingerprint config file from fingerbank to a DS so we can parse it
tie %dhcp_fingerprints, 'Config::IniFiles', ( -file => "dhcp_fingerprints.conf" );

# Build $fprint_class (maps OS name to "class")
foreach my $class (tied(%dhcp_fingerprints)->GroupMembers("class") ) {
  my ($min,$max) = split /\D/, $dhcp_fingerprints{$class}{"members"};
  $$fprint_class{ $dhcp_fingerprints{$class}{"description"} }{min}=$min;
  $$fprint_class{ $dhcp_fingerprints{$class}{"description"} }{max}=$max;
}

# Build $fprint_db (maps fingerprint to OS name)
foreach my $os ( tied(%dhcp_fingerprints)->GroupMembers("os") ) {
  $os =~ m/os (.*)$/gi;
  my $os_id = $1;

  if ( exists( $dhcp_fingerprints{$os}{"fingerprints"} ) ) {
    if ( ref( $dhcp_fingerprints{$os}{"fingerprints"} ) eq "ARRAY" ) {
      foreach my $dhcp_fingerprint ( @{ $dhcp_fingerprints{$os}{"fingerprints"} } ) {
        $$fprint_db{$dhcp_fingerprint}{"description"}=$dhcp_fingerprints{$os}{"description"};
        $$fprint_db{$dhcp_fingerprint}{"os"}=$os_id;   
      }
    } else {
      if (defined $dhcp_fingerprints{$os}{"fingerprints"}) {
        foreach my $dhcp_fingerprint (split(/\n/, $dhcp_fingerprints{$os}{"fingerprints"})) {
        $$fprint_db{$dhcp_fingerprint}{"description"}=$dhcp_fingerprints{$os}{"description"};
        $$fprint_db{$dhcp_fingerprint}{"os"}=$os_id;
        }
      }
    }
  }
}

# now we loop through all the fingerprints we've been given on STDIN and try to ID them
while () {
  chomp;
  my $fingerprint = $_;

  # See if it appears in $fprint_db...
  if(defined $$fprint_db{$fingerprint}) {
    # Count it
    $$os_counter{$$fprint_db{$fingerprint}{"description"}}{"count"}++;

    # Try to identify the type of OS
    foreach my $class (keys $fprint_class) {
      if ($$fprint_db{$fingerprint}{"os"} >= $$fprint_class{$class}{"min"} && $$fprint_db{$fingerprint}{"os"} <= $$fprint_class{$class}{"max"}) {
        $$os_counter{$$fprint_db{$fingerprint}{"description"}}{"class"}=$class;
      }
    }
    
    # If we haven't yet set the OS class, set it to "unknown"
    $$os_counter{$$fprint_db{$fingerprint}{"description"}}{"class"}="unknown" unless (defined $$os_counter{$$fprint_db{$fingerprint}{"description"}}{"class"});

    } else {
      # No idea what it was, so add it to the unknown count
      $$os_counter{"unknown"}{"count"}++;
      $$os_counter{"unknown"}{"class"}="unknown";
    }
  }

# Print summary output as a CSV
print "\n\nClass,OS,Count\n";
foreach my $os(keys %$os_counter) {
  print qq["$$os_counter{$os}{class}","$os","$$os_counter{$os}{count}"\n];
}

If I let that chew on a decent chunk of todays logs (from about 7am to 2pm) it spits out the following:

Class OS Count
Smartphones/PDAs/Tablets Samsung Galaxy Tab 3 7.0 SM-T210R 39
Home Audio/Video Equipment Slingbox 49
Dead OSes OS/2 Warp 1
Gaming Consoles Xbox 360 6
Windows Microsoft Windows Vista/7 or Server 2008 1694
Printers Lexmark Printer 1
Network Boot Agents Novell Netware Client 1
Macintosh Mac OS X Lion 2783
Misc Eye-Fi Wireless Memory Card 1
Printers Kyocera Printer 1
unknown unknown 40
Smartphones/PDAs/Tablets LG Nexus 5 & 7 1797
Printers HP Printer 54
CD-Based OSes PHLAK 1
Smartphones/PDAs/Tablets Nokia 13
Smartphones/PDAs/Tablets Motorola Android 2
Macintosh Mac OS X 145
Smartphones/PDAs/Tablets Generic Android 2989
Gaming Consoles Playstation 2 1
Linux Chrome OS 39
Linux Ubuntu/Debian 5/Knoppix 6 5
Routers and APs Cisco Wireless Access Point 69
Linux Generic Linux 7
Linux Ubuntu 11.04 21
Windows Microsoft Windows 8 1792
Routers and APs Apple Airport 2
Routers and APs DD-WRT Router 3
Smartphones/PDAs/Tablets Sony Ericsson Android 1
Linux Debian-based Linux 51
Smartphones/PDAs/Tablets Symbian OS 2
Storage Devices LaCie NAS 27
Windows Microsoft Windows XP 30
Smartphones/PDAs/Tablets Android Tablet 24
Monitoring Devices Tripplite UPS 1
Smartphones/PDAs/Tablets Apple iPod, iPhone or iPad 12289
Smartphones/PDAs/Tablets Samsung S5260 Star II 2
Smartphones/PDAs/Tablets RIM BlackBerry 63

I’m not sure I 100% believe the above (OS/2 Warp? Really?) but the bits I disbelieve are largely in the noise.

Chewing on the above stats a bit, shows us that the wireless network is roughly 27% laptops and 72% mobile devices (tablets etc). Amongst the laptops, Windows is just about in the lead with 53%, and OSX is close behind at 44% (which is probably higher than a lot of people think) Linux laptops are trailing behind at only 2%.

The mobile device landscape is less evenly split, with 71% iOS and 28% Android.

Although I wouldn’t read too much into the above analysis, as it represents a comparatively small time slice (and only 23775 of the 37000 devices we see on the wireless each week)

Who knows, perhaps we’ve got 13K windows phones owned by people who just don’t come onto campus on a Monday…

Update 2015-05-11: I’ve been asked under what license I’ve released the perl script in this post. I didn’t put any thought into licenses at the time (I was just trying to solve a problem and answer a question I’d been asked!) but I’ll put my hand up, part of the script is based on prior-art.

The section which parses the fingerprint database is taken from the process_fingerprints() function in https://github.com/inverse-inc/fingerbank/blob/master/obsolete/tools/fingerprint-find-candidate-matches.pl – a script which seems to be covered by the GPLv2 licence.

As I understand it, under the terms of the GPLv2 license, that means that the script above should also be distributed under the GPLv2 license (which I’m OK with) and that under the terms of that license it should be distributed along with a copy of the GPLv2 license… which can be found here: https://www.gnu.org/licenses/gpl-2.0.html