Dell servers, warranty facts and refresh-mcollective-metadata

On our physical Dell servers we install the dell-omsa packages which give us the ability to monitor our underlying hardware.

With that in place, you can use facter to report on all sorts of useful things about the hardware, including the state of the warranty.

The fact which checks warranty information, uses dell-omsa to pull the service tag of the server and submits it to Dells API – which then returns info about the status of your warranty.

You can then use mcollective to report on this. This can be really useful if you can’t remember what you bought when!

Unfortunately, from time to time it breaks and we start getting cronjob output which looks like this:

/usr/libexec/mcollective/refresh-mcollective-metadata
Could not retrieve fact='warranty_end', resolution='': undefined method `[]' for nil:NilClass
Could not retrieve fact='warranty_days_left', resolution='': can't dup NilClass
Could not retrieve fact='warranty_start', resolution='': undefined method `[]' for nil:NilClass
Could not retrieve fact='warranty_end', resolution='': undefined method `[]' for nil:NilClass

This happens just frequently enough to be a familiar problem for us, but not frequently enough for the fix to stick in my mind!

Googling for the error messages yeilds a couple of mailinglist threads asking about this error and how to work around it – which were both started by my colleague Jonathan Gazeley the first time we hit the problem. [1]

There are no actual fixes in those threads, although one post did hint at the root cause being mcollective caching the result of the Dell API call – without actually stating where it gets cached.

So, it’s strace time!

sudo strace -e open /usr/libexec/mcollective/refresh-mcollective-metadata 2>&1 | less

Skip to the end, and page back until you get to the bit where it starts complaining about the warranty fact, and you find that it’s trying to open /var/tmp/dell-warranty-XXXXXXX.json where XXXXXXX is the service tag of the hardware.

...
open("/var/tmp/dell-warranty-XXXXXXX.json", O_RDONLY) = 3
Could not retrieve fact='warranty_end', resolution='': undefined method `[]' for nil:NilClass
Could not retrieve fact='warranty_days_left', resolution='': can't dup NilClass
...

In our most recent case, the contents of that file looked like this:

$ cat /var/tmp/dell-warranty-XXXXXXX.json 
{
  "GetAssetWarrantyResponse": {
    "GetAssetWarrantyResult": {
      "Response": null,
      "Faults": {
        "FaultException": {
          "Message": "The tag you sent is not present. Check your separator character and ensure it is |.",
          "Code": 4001
        }
      }
    }
  }
}

That looks a lot to me like the API call failed for some reason.

The fix is to remove that stale cache file and re-run the mcollective-refresh-metadata script.

$ sudo rm /var/tmp/dell-warranty-*.json
$ sudo /usr/libexec/mcollective/refresh-mcollective-metadata

Then inspect the cached file again. It should now contain a lot of warranty info.

If it doesn’t, well… then you need to start working out why, and that’s an exercise left for the reader!

-Paul
[1] https://groups.google.com/forum/#!msg/puppet-users/LsK3HbEBMGc/-DSIOMNCDzIJ

I freely admit that the intent behind this post is mostly about getting the “fix” into those google search results – so I don’t have to resort to strace next time it happens!

Striving to be agile in a non-agile environment

I recently read a post over at the Government Data Services blog, which struck a chord with me. It was titled How to be agile in a non agile environment

I work in the Wireless/DNS/ResNet team. We’re a small team and as such we’re keen to adopt technologies or working practices which enable us to deliver as stable a service as possible while still being able to respond to users requirements quickly.

Over the last few years, we’ve converged on an approach which could be described as “agile with a small a”

We’re not software developers, we’re probably better described as an infrastructure operations team. So some of the concepts in Agile need a little translation, or don’t quite fit our small team – but we’ve cherry picked our way into something which seems to get a fair amount of bang-per-buck.

At it’s heart, the core beliefs of our approach are that shipping many small changes is less disruptive than shipping one big change, that our production environment should always be in a deployable state (even if that means it’s missing features) and that we collect metrics about the use of our services and use that to inform the direction we move in.

We’ve been using Puppet to manage our servers for almost 5 years now – so we’re used to “infrastructure as code” – we’ve got git (with gitlab) for our source code control, r10k to deploy ephemeral environments for developing/testing in, and a workflow which allows us to push changes through dev/test/production phases with every change being peer reviewed before it hits production.

However – we’re still a part of IT Services, and the University of Bristol as a whole. We have to work within the frameworks which are available, and play the same game as everyone else.

The University isn’t a particularly agile environment, it’s hard for an institution as big and as long established as a University to be agile! There are governance processes to follow, working groups to involve, stakeholders to inform and engage and standard tools used by the rest of the organisation which don’t tie in to our toolchain particularly nicely…

Using our approach, we regularly push 5-10 production changes a day in a controlled manner (not bad for a team of 2 sysadmins) with very few failures[1]. Every one of those changes is recorded in our systems with a full trail of who made what change, the technical detail of the change implementation and a record of who signed it off.

Obviously it’s not feasible to take every single one of those changes to the weekly Change Advisory Board, if we did the meeting would take forever!

Instead, we take a pragmatic approach and ask ourselves some questions for every change we make:

  • Will anyone experience disruption if the change goes well?
  • Will anyone experience disruption if the change goes badly?

If the answer to either of those questions is yes, then we ask ourselves an additional question. “Will anyone experience disruption if we *don’t* make the change”

The answers to those three questions inform our decision to either postpone or deploy the change, and help us to decide when it should be added to the CAB agenda. I think we strike about the right balance.

We’re keen to engage with the rest of the organisation as there are benefits to us in doing so (and well, it’s just the right thing to do!) and hopefully by combining the best of both worlds we can continue to deliver stable services in a responsive manner and still move the services forward.

I feel the advice in the GDS post pretty much mirrors what we’re already doing, and it’s working well for us.

Hopefully it could work well for others too!

[1] I say “very few failures” and I’m sure that probably scares some people – the notion that any failure of a system or change could be in any way acceptable.

I strongly believe that there is value in failure. Every failure is an opportunity to improve a system or a process, or to design in some missing resilience. Perhaps I’ll write more about that another time, as it’s a bit off piste from what I indented to write about here!

Xen VM networking and tcpdump — checksum errors?

Whilst searching for reasons that a CentOS 6 samba gateway VM we run in ZD (as a “Fog VM” on a Xen hypervisor) was giving poor performance and seemingly dropping connections during long transfers, I found this sort of output from tcpdump -v host <IP_of_samba_client> on the samba server:

10:44:07.228431 IP (tos 0x0, ttl 64, id 64091, offset 0, flags [DF], proto TCP (6), length 91)
    server.example.org..microsoft-ds > client.example.org.44729: Flags [P.], cksum 0x3e38 (incorrect -> 0x67bc), seq 6920:6959, ack 4130, win 281, options [nop,nop,TS val 4043312256 ecr 1093389661], length 39SMB PACKET: SMBtconX (REPLY)

10:44:07.268589 IP (tos 0x0, ttl 60, id 18217, offset 0, flags [DF], proto TCP (6), length 52)
    client.example.org.44729 > server.example.org..microsoft-ds: Flags [.], cksum 0x1303 (correct), ack 6959, win 219, options [nop,nop,TS val 1093389712 ecr 4043312256], length 0

Note that the cksum field is showing up as incorrect on all sent packets. Apparently this is normal when hardware checksumming is used, and we don’t see incorrect checksums at the other end of the connection (by the time this shows up on the client).

However, testing has shown that the reliability and performance of connections to the samba server on this VM is much greater when hardware checksumming is disabled with:

ethtool -K eth0 tx off

Perhaps our version of Xen on the hypervisor, or a combination of the version of Xen and versions of drivers on the hypervisor and/or client VM are causing networking problems?

My networking-fu is weak, so I couldn’t say more than what I have observed, even though this shouldn’t really make a difference…
(Please comment if you have info on why/what may be going on here!)

Ubuntu bug 31273 shows others with similar problems which were solved by disabling hardware checksum offloading.

“KDC has no support for encryption type” with old ciphers against Active Directory

A single machine somehow managed to have a differently-configured /etc/krb5.conf file and recently stopped all (both ssh and on the console, except for root) logins from working. The messages in the logs were of the form:

Sep 29 15:04:58 test-host sshd[1433]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= host=host.example.com user=user12345
Sep 29 15:04:58 test-host sshd[1433]: pam_krb5[1433]: authentication fails for 'user12345' (user12345@REALM.EXAMPLE.COM): Authentication failure (KDC has no support for encryption type)
Sep 29 15:05:00 test-host sshd[1433]: Failed password for user12345 from 1.2.3.4 port 50432 ssh2

The reason for this was simple – the Kerberos config in /etc/krb5.conf contained the following lines:

[libdefaults]
        ... (other lines snipped)
        default_tkt_enctypes = des-cbc-crc
        default_tgs_enctypes = des-cbc-crc

These settings force the use of an older DES encryption type which is only 56-bit, and has been disabled since Windows 7/Windows Server 2008 R2. Removing these lines so that the encryption type is automatically negotiated allows stronger encryption to be used which is supported by the Active Directory servers, allowing us to login once more. Phew!

(This is a legacy CentOS 5 server, all the newer ones have the same Kerberos config on them — thankfully the same config works on CentOS 5/6/7 and Debian/Ubuntu without modifications thus far!)

What’s using all my swap?

On a couple of occasions recently, we’ve noticed swap use getting out of hand on a server or two. There’s been no common cause so far, but the troubleshooting approach has been the same in each case.

To try and tell the difference between a VM which is generally “just a bit tight on resources” and a situation where process has run away – it can sometimes be handy to work out what processes are hitting swap.

The approach I’ve been using isn’t particularly elegant, but it has proved useful so I’m documenting it here:

grep VmSwap /proc/*/status 2>&1 | perl -ne '/\/(\d+)\/[^\d]*(\d+) (.B)$/g;if($2>0){$name=`ps -p $1 -o comm=`;chomp($name);print "$name ($1) $2$3\n"}'

Lets pick it apart a component at a time.

grep VmSwap /proc/*/status 2>&1

The first step is to pull out the VmSwap line from the PID status files held in /proc. There’s one of these files for each process on the system and it tracks all sorts of stuff. VmSwap is how much swap is currently being used by this process. The grep gives output like this:

...
/proc/869/status:VmSwap:	     232 kB
/proc/897/status:VmSwap:	     136 kB
/proc/9039/status:VmSwap:	    5368 kB
/proc/9654/status:VmSwap:	     312 kB
...

That’s got a lot of useful info in it (eg the PID is there, as is the amount of swap in use), but it’s not particularly friendly. The PID is part of the filename, and it would be more useful if we could have the name of the process as well as the PID.

Time for some perl…

perl -ne '/\/(\d+)\/[^\d]*(\d+) (.B)$/g;if($2>0){$name=`ps -p $1 -o comm=`;chomp($name);print "$name ($1) $2$3\n"}'

Dealing with shell side of things first (before we dive into the perl code) “-ne” says to perl “I want you to run the following code against every line of input I pipe your way”.

The first thing we do in perl itself is run a regular expression across the line of input looking for three things; the PID, the amount of swap used and the units reported. When the regex matches, this info gets stored in $1, $2 and $3 respectively.

I’m pretty sure the units are always kB but matching the units as well seemed safer than assuming!

The if statement allows us to ignore processes which are using 0kB of swap because we don’t care about them, and they can cause problems for the next stage:

$name=`ps -p $1 -o comm=`;chomp($name)

To get the process name, we run a “ps” command in backticks, which allows us to capture the output. “-p $1” tells ps that we want information about a specific PID (which we matched earlier and stored in $1), and “-o comm=” specifies a custom output format which is just the process name.

chomp is there to strip the ‘\n’ off the end of the ps output.

print "$name ($1) $2$3\n"

Lastly we print out the $name of the process, it’s PID and the amount of swap it’s using.

So now, you get output like this:

...
automount (869) 232kB
cron (897) 136kB
munin-node (9039) 5364kB
exim4 (9654) 312kB
...

The output is a little untidy, and there is almost certainly a more elegant way to get the same information. If you have an improvement, let me know in the comments!

Molly-guard for CentOS 7?

Since I was looking at this already and had a few things to investigate and fix in our systemd-using hosts, I checked how plausible it is to insert a molly-guard-like password prompt as part of the reboot/shutdown process on CentOS 7 (i.e. using systemd).

Problems encountered include:

  • Asking for a password from a service/unit in systemd — Use systemd-ask-password and needs some agent setup to reply to this correctly?
  • The reboot command always walls a message to all logged in users before it even runs the new reboot-molly unit, as it expects a reboot to happen. The argument --no-wall stops this but that requires a change to the reboot command. Hence back to the original problem of replacing packaged files/symlinks with RPM
  • The reboot.target unit is a “systemd.special” unit, which means that it has some special behaviour and cannot be renamed. We can modify it, of course, by editing the reboot.target file.
  • How do we get a systemd unit to run first and block anything later from running until it is complete? (In fact to abort the reboot but just for this time rather than being set as permanently failed. Reboot failing is a bit of a strange situation for it to be in…) The dependencies appear to work but the reboot target is quite keen on running other items from the dependency list — I’m more than likely doing something wrong here!

So for now this is shelved. It would be nice to have a solution though, so any hints from systemd experts are greatfully received!

(Note that CentOS 7 uses systemd 208, so new features in later versions which help won’t be available to us)

Molly-guard for RHEL/CentOS – protect your hosts from accidental reboots!

Molly-guard is a very useful package which replaces the default halt and reboot (and other related) commands with a version which prompts you to type the hostname of the host you intended to halt/reboot before it continues to do so. For example:

root@testhost:~# reboot
I: molly-guard: reboot is always molly-guarded on this system.
Please type in hostname of the machine to reboot: [type incorrect hostname]
Good thing I asked; I won't reboot testhost ...
W: aborting reboot due to 30-query-hostname exiting with code 1.

This is invaluable if you use a lot of different systems and they are often in use by other users whom you don’t want to anger with accidental reboots…

For Debian-based distros (including Ubuntu), it’s available via a simple apt-get install molly-guard. On RHEL-based distros, unfortunately, it’s not in the base repositories and I was unable to find a suitably-trustworthy repository which contains it.

So this leads to asking some questions:

What does it do?

Simply put, it copies the existing /sbin/halt and related commands to a separate directory (by default /lib/molly-guard), and replaces them with symlinks to /lib/molly-guard/molly-guard to ensure that the new executable is used.

By default it only requires hostname confirmation when you are logged in via ssh, but this can be changed to always ask for the hostname by setting the ALWAYS_QUERY_HOSTNAME variable in the /etc/molly-guard/rc configuration file. Further customisations are possible by adding scripts to run to the /etc/molly-guard/run.d directory, and if any of these exit with a non-zero exit code then the reboot is aborted. (This is how the hostname check is done, but you can add whatever logic you want via this method)

How can we make this work on RHEL / Why are there no packages for RHEL?

Someone kindly ported a version of molly-guard from Debian to RHEL, and a github repo of this is available here. Unfortunately this doesn’t quite solve the problem, as creating a package from this (having updated it for molly-guard 0.6.2) creates an RPM which gives us errors when we try to install it:

Running Transaction Test


Transaction Check Error:
  file /sbin/halt from install of molly-guard-0.6.2-1.1.noarch conflicts with file from package upstart-0.6.5-13.el6_5.3.x86_64
  file /sbin/poweroff from install of molly-guard-0.6.2-1.1.noarch conflicts with file from package upstart-0.6.5-13.el6_5.3.x86_64
  file /sbin/reboot from install of molly-guard-0.6.2-1.1.noarch conflicts with file from package upstart-0.6.5-13.el6_5.3.x86_64
  file /sbin/shutdown from install of molly-guard-0.6.2-1.1.noarch conflicts with file from package upstart-0.6.5-13.el6_5.3.x86_64

RPM really doesn’t like replacing files which are owned by another package, so an alternative (you’ll see what I did there in a minute) strategy is required.

Handily there’s a tool called alternatives which can handle selecting which of a set of binaries to use, via managing a directory of symlinks. (See!)

If we re-create the RPM without the explicit symlinks and instead use a post-install script snippet which copies the halt/reboot binaries to the molly-guard directory and sets up alternatives to point at them then this might just work!

(There are a bunch more reboot/halt commands which are in /usr/bin/... on RHEL, so we need to turn those into links as well:

alternatives --install /sbin/halt halt              /lib/molly-guard/molly-guard 999 \
  --slave /sbin/powreoff          poweroff          /lib/molly-guard/molly-guard \
  --slave /sbin/reboot            reboot            /lib/molly-guard/molly-guard \
  --slave /sbin/shutdown          shutdown          /lib/molly-guard/molly-guard \
  --slave /sbin/coldreboot        coldreboot        /lib/molly-guard/molly-guard \
  --slave /sbin/pm-hibernate      pm-hibernate      /lib/molly-guard/molly-guard \
  --slave /sbin/pm-suspend        pm-suspend        /lib/molly-guard/molly-guard \
  --slave /sbin/pm-suspend-hybrid pm-suspend-hybrid /lib/molly-guard/molly-guard \
\
  --slave /usr/bin/reboot         usrbinreboot      /lib/molly-guard/molly-guard \
  --slave /usr/bin/halt           usrbinhalt        /lib/molly-guard/molly-guard \
  --slave /usr/bin/poweroff       usrbinpoweroff    /lib/molly-guard/molly-guard

# Ensure we're using this by default:
alternatives --set halt /lib/molly-guard/molly-guard

At this point we’ve got a set of alternatives and a post-install RPM scriptlet which copies the required commands into the /lib/molly-guard directory, but there is still a problem that RPM will clobber these when the clashing packages are updated! So we definitely need to ensure that our alternatives get re-added after any updates. To do this we can do one of the following:

  • Have a cron job which runs regularly and enforces the alternatives setting for “halt” (the “slave” entries trigger all the others to update when this is done)
  • Use Puppet or some other configuration management to enforce the alternatives setting (will have a small window of the binaries being the new version until the alternatives are reset by Puppet on the next run)
  • Scrap the alternatives method and use something else like modifying the PATH to ensure our reboot/halt versions are before the system versions (will break if scripts use full paths to the commands, which is pretty common)
  • Drop this approach and create a PAM module which does the same as molly-guard, then use consolehelper and PAM instead (untested)

I’ve decided to go for the second of these and have configuration management ensure that the “alternatives” settings are enforced.

Will this have any side-effects?

One side-effect is that on RHEL6 the reboot-related commands in /usr/bin use “consolehelper” to control access to these commands through PAM. Without some additional jiggery-pokery this functionality will be broken by this update.

Updates to the packages will need to be handled somehow, perhaps by detecting that the binaries have been put back in place and updating the molly-guard-controlled versions. (Eek!)

I’ve not looked at updating the package for RHEL7, where the reboot and related commands all link to systemctl for systemd control. This is likely to need some different shenanigans to get it to work…

Where can I get the packages?

These are a work in progress but the updated git repoistory is available here. RPMs will be available from http://packages.bris.ac.uk/centos/6/zone_d/ (UoB internal only access) once finalised. It it works well these should be published more widely 🙂

No RHEL7 packages yet I’m afraid — I need to investigate how to get this to work with systemd!

(Bonus Q: Why is it called Molly-guard? See this definition for an explanation)

Rethinking the wireless database architecture

The eduroam wireless network has a reliance on a database for the authorization and accounting parts of AAA (authentication, authorization and accounting – are you who you say you are, what access are you allowed, and what did you do while connected).

When we started dabbling with database-backed AAA in 2007 or so, we used a centrally-provided Oracle database. The volume of AAA traffic was low and high performance was not necessary. However (spoiler alert) demand for wireless connectivity grew and before many months, we were placing more demand on Oracle than it could handle. The latency of our queries was taking sufficiently long that some wireless authentication requests would time out and fail.

First gen – MySQL (2007)

It was clear that we needed a dedicated database platform, and at the time that we asked, the DBAs were not able to provide a suitable platform. We went down the route of implementing our own. We decided to use MySQL as a low-complexity open source database server with a large community. The first iteration of the eduroam database hardware was a single second-hand server that was going spare. It had no resilience but was suitably snappy for our needs.

First gen database

First gen database

Second gen – MySQL MMM (2011)

Demand continued to grow but more crucially eduroam went from being a beta service that was “not to be relied upon” to being a core service that users routinely used for their teaching, learning and research. Clearly a cobbled-together solution was no longer fit for purpose, so we went about designing a new database platform.

The two key requirements were high query capacity, and high availability, i.e. resilience against the failure of an individual node. At the time, none of the open source database servers had proper clustering – only master-slave replication. We installed a clustering wrapper for MySQL, called MMM (MySQL Multi Master). This gave us a resilient two-node cluster whether either node could be queried for reads and one node was designated the “writer” at any one time. In the event of a node failure, the writer role would be automatically moved around by the supervisor.

Second gen database

Second gen database

Not only did this buy us resilience against hardware faults, for the first time it also allowed us to drop either node out of the cluster for patching and maintenance during the working day without affecting service for users.

The two-node MMM system served us well for several years, until the hardware came to its natural end of life. The size of the dataset had grown and exceeded the size of the servers’ memory (the 8GB that seemed generous in 2011 didn’t really go so far in 2015) meaning that some queries were quite slow. By this time, MMM had been discontinued so we set out to investigate other forms of clustering.

Third gen – MariaDB Galera (2015)

MySQL had been forked into MariaDB which was becoming the default open source database, replacing MySQL while retaining full compatibility. MariaDB came with an integrated clustering driver called Galera which was getting lots of attention online. Even the developer of MMM recommended using MariaDB Galera.

MariaDB Galera has no concept of “master” or “slave” – all the nodes are masters and are considered equal. Read and write queries can be sent to any of the nodes at will. For this reason, it is strongly recommended to have an odd number of nodes, so if a cluster has a conflict or goes split-brain, the nodes will vote on who is the “odd one out”. This node will then be forced to resync.

This approach lends itself naturally to load-balancing. After talking to Netcomms about the options, we placed all three MariaDB Galera nodes behind the F5 load balancer. This allows us to use one single IP address for the whole cluster, and the F5 will direct queries to the most appropriate backend node. We configured a probe so the F5 is aware of the state of the nodes, and will not direct queries to a node that is too busy, out of sync, or offline.

Third gen database

Third gen database

Having three nodes that can be simultaneously queried gives us an unprecedented capacity which allows us to easily meet the demands of eduroam AAA today, with plenty of spare capacity for tomorrow. We are receiving more queries per second than ever before (240 per second, and we are currently in the summer vacation!).

We are required to keep eduroam accounting data for between 3 and 6 months – this means a large dataset. While disk is cheap these days and you can store an awful lot of data, you also need a lot of memory to hold the dataset twice over, for UPDATE operations which require duplicating a table in memory, making changes, merging the two copies back and syncing to disk. The new MariaDB Galera nodes have 192GB memory each while the size of the dataset is about 30GB. That should keep us going… for now.

Rsync between two hosts using sudo and a password prompt

Using rsync normally is nice and straightforward. e.g.:

# rsync -av -e 'ssh' /some/local/path user@remote:/some/remote/path

This works fine and prompts for the ssh password to log into the remote machine if required.

But what if the remote end needs root (or a different user) rights to write into the destination directory? Just whack in an --rsync-path option to add sudo to the rsync command, right?:

# rsync -av -e 'ssh' --rsync-path='sudo rsync' /some/local/path user@remote:/some/remote/path
user@remote's password: 
sudo: no tty present and no askpass program specified
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(632) [sender=3.0.4]

Oh, that didn’t work — sudo couldn’t ask us for a password. Adding the -t option to the ssh used by rsync doesn’t work either, as it can’t allocate a tty:

# rsync -av -e 'ssh -t' --rsync-path='sudo rsync' /some/local/path user@remote:/some/remote/path
Pseudo-terminal will not be allocated because stdin is not a terminal.
user@remote's password:

So we need to tell sudo to ask for the password some other way. Fortunately there is a -A option for sudo which tells it to use an “askpass” program, but we also need to tell it what askpass program to use (and it’s not in the default path on most machines). We can find this with locate askpass on the remote machine:

[user@remote]$ locate askpass
/etc/profile.d/gnome-ssh-askpass.csh
/etc/profile.d/gnome-ssh-askpass.sh
/usr/libexec/openssh/gnome-ssh-askpass
/usr/libexec/openssh/ssh-askpass

We’ll use /usr/libexec/openssh/ssh-askpass as that should pick an appropriate version according to what is available to sudo. Again sudo won’t have a tty to ask for the password on, so how about we use an X11 askpass program and enable X forwarding for ssh:

# rsync -av -e 'ssh -X' --rsync-path='SUDO_ASKPASS=/usr/libexec/openssh/ssh-askpass sudo -A rsync' /some/local/path user@remote:/some/remote/path
user@remote's password: [type user password]
[Then a dialog pops up for the sudo password]

Hooray, this works! Bit of a faff but it could be scripted or made into a shell function to save having to remember it 🙂

Yes, you could enable remote root login, but it would definitely be preferable to avoid that.

SysAdmin Appreciation Day

Blimey, the “last Friday in July” has rolled round again quickly, and today marks International SysAdmin Appreciation Day.

So, I’d like to say a heartfelt thank you to all the sysadmins I interact with in IT Services!

Not just the linux/unix guys I work directly with, but the Windows team who provide me with useful things like authentication and RDP services, the network team who keep us talking to each other, the storage and backups team who provide me with somewhere to keep my data and anyone else who provides me with a service I consume!

You’re all doing great work which no one notices until it breaks, and then you unfailingly go above and beyond to restore service as soon as possible.

Thankyou my fellow sysadmins! I couldn’t do my job without you all doing yours.