How to use SELinux

SELinux is one of the least well understood components of modern Linux distributions. Search any forum or mailing list and you are likely to find recommendations to switch it off because it “breaks things”. When we decided to migrate the ResNet and eduroam servers from CentOS 5 to 6, we took the decision to move from “SELinux off by default” to “SELinux on by default, and only off where necessary”. Turns out it’s not that hard to configure ūüôā

Introduction

Explaining exactly how SELinux works is beyond the scope of this blog post – but suffice it to say that it works by labelling parts of the filesystem, users and processes with a context to say what they are for. It will then block actions it thinks are unsafe – for example, even if your httpd has filesystem permissions to write to /etc/ssh/, by default SELinux would block this action because it isn’t usual. To learn more, have a look at the many web pages about SELinux.

Configuring SELinux

Configuring SELinux to work nicely on your system is best described as “training” it, and is a lot like training a spam filter. You have to look at the SELinux audit log to see what actions were blocked, review them, and then add them to a whitelist by loading a new policy. You can load as many supplementary policies as you need.

Your SELinux installation should always be left in enforcing mode by default. Edit /etc/selinux/config to make sure it is enforcing, but be aware that this needs a reboot to take effect.

# /etc/selinux/config
SELINUX=enforcing
SELINUXTYPE=targeted

When you want to temporarily enable permissive mode, issue the command sudo setenforce 0. This takes effect immediately. Don’t forget to run sudo setenforce 1 to re-enable enforcing mode after you’ve finished debugging.

When you start out configuring SELinux, it’s important to run it in permissive mode, rather than enforcing mode. Let’s say you want to debug an application that wants to perform operations A, B and C, which would all be blocked by SELinux. In permissive mode, the application would be allowed to run, and SELinux logs what it would have blocked had it been in enforcing mode. Operations A, B and C are all logged and can then be added to the policy. In enforcing mode, the application tries operation A, is blocked and often doesn’t even bother trying operations B and C – so they are never logged, and cannot be debugged.

Capturing SELinux audit logs and generating a policy

All SELinux operations are stashed in the audit log, which is in /var/log/audit/audit.log on CentOS by default. The audit log is not hugely easy to read by eye, but you can install the package policycoreutils-python which provides some handy analysis tools.

Assuming you’ve already dropped SELinux into permissive mode, now try executing the operations you wish to debug: might be testing a Nagios plugin, running a new application, or something else. It should succeed as SELinux is permissive, but it will log all the things it would otherwise have blocked.

Run this command, grepping for the process you’re interested in to generate a policy file to grant all those accesses. Be aware of namespacing issues. SELinux comes with a bunch of bundled policies which are called things like nagios and httpd. If you are loading supplementary policies for these things, it’s best to add a prefix like resnet-nagios or sysops-nagios. The default file extension for a text-mode policy is .te.

sudo cat /var/log/audit/audit.log | grep nagios | audit2allow -m resnet-nagios > resnet-nagios.te

Your .te file is more-or-less human readable and you should inspect it to make sure your new policy isn’t going to allow anything bad. Here’s the .te file I generated by running the above command on my Nagios server:

module resnet-nagios 1.0;

require {
  type nagios_system_plugin_t;
  type nagios_t;
  type tmp_t;
  type initrc_tmp_t;
  type nagios_services_plugin_t;
  class file { read ioctl write getattr open append };
}

#============= nagios_services_plugin_t ==============
allow nagios_services_plugin_t tmp_t:file append;

#============= nagios_system_plugin_t ==============
allow nagios_system_plugin_t tmp_t:file append;

#============= nagios_t ==============
allow nagios_t initrc_tmp_t:file { read write getattr open ioctl };
#!!!! The source type 'nagios_t' can write to a 'file' of the following types:
# nagios_var_run_t, nagios_log_t, nagios_tmp_t, root_t

allow nagios_t tmp_t:file { write ioctl read open getattr append };

Loading a custom SELinux policy by hand

Now that we’ve come up with a text-based SELinux policy, it needs to be converted into a binary policy that can be loaded. The command is very similar but note the capital M rather than lower case, which makes it write out a binary policy which has a .pp extension (not to be confused with Puppet manifests ;))

sudo cat /var/log/audit/audit.log | grep nagios | audit2allow -M resnet-nagios

Once you’ve got your binary SELinux module, loading it by hand is easy:

sudo semodule -i resnet-nagios.pp

The CentOS wiki page on SELinux is handy for manipulating policies manually.

Loading a custom SELinux policy with Puppet

There are Puppet modules available which handle the compiling and loading of modules automatically – you just need to provide the .te file and it will handle the rest. For the ResNet and eduroam servers, we are using James Fryman’s puppet-selinux module. It’s not necessarily the best but it was the most appropriate for us at the time we took the decision over a year ago and has worked solidly – other modules are also available.¬†Here’s how we’re using it:

include selinux
if $::osfamily == 'RedHat' {
  selinux::module { 'resnet-nagios':
    ensure => 'present',
    source => 'puppet:///modules/nagios/resnet-nagios.te',
  }
}

Summary

That’s more or less it! It’s easy to set SELinux in permissive mode, capture output and create your own policies. There really is no excuse not to be using it ūüėČ

I hope this article has been useful. If you’re a member of UoB and you want to talk about SELinux or Puppet, grab me on the #uob-unix IRC channel. I’m dj_judas21.

Forcing cssh to use IPv4

csshThe majority of the servers I look after are managed by puppet[1], but we have a suite of 10 older servers which fall outside of that management environment.

We’re slowly replacing them with Shiny! New! Managed! Servers! ™ but until we’ve completed that work we’re stuck with limited management tools, doing things manually at the command line, like it’s still the 20th century[2]

Occasionally we need to do the same thing on every server (eg kick off a “yum update” or whatever) and using ssh to connect to them all one at a time is tedious.

So, we use cssh¬†which is a scary…¬†dangerous… “powerful” tool that spawns a load of ssh sessions and allows you to send the same keypresses to all the servers at once. ¬†I don’t like using it, it feels like a really dirty way to admin a bunch of hosts, but sometimes it’s a necessary evil.

As long as you’re careful what you type, and don’t do anything daft like “sudo bash” then you can keep a lid on the fear.

One of the “features” of this bundle of 10 servers is that they’re dual stack IPv4 and IPv6, but ssh is only accepting connections on the IPv4 address.

If you’re connecting to these one at a time, “ssh -4 server.name.bris.ac.uk” will sort that out for you, but it took a little more rummaging in man pages to come up with the cssh alternative, and that’s the real purpose of this post.

Todays Tip
To force cssh to use IPv4 when connecting to a bunch of hosts, use:

cssh --options="-o AddressFamily=inet" <host list>

[1] Other config management systems are available ūüôā
[2] To be fair, sometimes it’s also useful to kick off a puppet run everywhere

Government Digital Service (GDS) Tidbits

I’ve recently started reading the blog run by the Government Digital Service (the team behind .gov.uk) and there have been a handful of particularly interesting posts over the last week or so.

In a lot of ways, the GDS team are a similar organisation to some parts of IT Services, and rather like how we *could* be working in terms of agile/devops methodologies and workflows. ¬†They’re doing a lot of cool stuff, and in interesting ways.

Most of it isn’t directly unix related (it’s mostly web application development) but when I spot something interesting I’ll try and flag it up here. ¬†Starting with a couple of examples…

“Three cities, one alpha, one day”
http://digital.cabinetoffice.gov.uk/2013/07/23/three-cities-one-alpha-one-day/

This is a video they put up in a blog post about rapidly developing an alpha copy of the Land Registry property register service. ¬†This rapid, iterative, user led development approach is an exciting way to build a service, and it’s interesting to watch another team go at something full pelt to see how far they can get in a day!

I’d have embedded it, but for some reason wordpress is stripping out the embed code…

FAQs: why we don’t have them
http://digital.cabinetoffice.gov.uk/2013/07/25/faqs-why-we-dont-have-them/

Another article which jumped out at me was this one about why FAQs are probably not the great idea we all thought they were.

Like many teams, the team I work in produces web based documentation about how to use our services, and yes, we’ve got our fair share of FAQs! Since I read this article, I’ve been thinking about working towards getting rid of them.

Instead of thinking about “what questions do we get asked a lot?” perhaps we really should be asking “why do people ask that question a lot?” and either eliminate the need for them to ask by making the service more intuative, or make it easier for them to find the answer themselves by changing the flow of information we feed them.

I doubt we can eliminate our FAQs entirely, they’re useful as a way of storing canned answers to problems outside our domain of control – eg for things like “how do I find out my wireless mac address?” ¬†However if we can fix the root cause where the problem is within our domain – we reduce the list of items on our FAQ, which makes them clearer and easier to use if people do stumble across them – and still gives us a place to store our canned answers.

Ideally I think those canned answers would better live in a knowledgebase, or some kind of “smart answers” system. ¬†Which brings me on to my last example…

The Smart Answer Bug
http://digital.cabinetoffice.gov.uk/2013/07/31/the-smart-answer-bug/

“Smart Answers” is an example of an expert system, which guides customers through a series of questions in order to narrow down their problem and offer them helpful advice quickly and easily.

The gov.uk smart answers system had a fault recently, which was only noticed because their analytics setup showed up some anomalous behaviour.

It turned out to be a browser compatibility fault with the analytics code, but the article really shows up the power of gathering performance and usage data about your services.

Although the example in the post is about web analytics, we can gather a lot of similar information about servers and infer similar information about service faults by analysing the results.

If we do that analysis well (and in an automated way) we can pick up faults before our users even notice a problem.