Dell C6145 (and presumably other Dell Cloud hosts) IPMItool BMC setup commands

Upgrading the BMC firmware on these hosts resets the settings to default (argh!), which includes:

  • Setting to DHCP for IP source
  • Losing the static IP, netmask and default gateway settings
  • Switching to a “shared” NIC rather than dedicated
    • (This doesn’t appear to be “use dedicated then fall back if not”, just straight to “shared”…)

 

Unfortunately, the various Dell docs don’t make this clear, nor exactly which ipmitool commands to run on a C6145 to set the BMC back to “dedicated” network port usage.

I haven’t tried these on any other Dell Cloud models yet (e.g. C5000, C8000), so I don’t know if they work at all!  Use them at your own risk!

 

Resetting the BMC IP setup is fairly straightforward:

# ipmitool lan set 1 ipsrc static
# ipmitool lan set 1 ipaddr 1.2.3.4
# ipmitool lan set 1 netmask 255.255.255.0
# ipmitool lan set 1 defgw ipaddr 1.2.3.250

Then printing the current config shows the expected configuration:

# ipmitool lan print 1
Set in Progress         : Set Complete
Auth Type Support       : MD2 MD5 PASSWORD
Auth Type Enable        : Callback : MD2 MD5 PASSWORD
                        : User     : MD2 MD5 PASSWORD
                        : Operator : MD2 MD5 PASSWORD
                        : Admin    : MD2 MD5 PASSWORD
                        : OEM      : MD2 MD5 PASSWORD
IP Address Source       : Static Address
IP Address              : 1.2.3.4
Subnet Mask             : 255.255.255.0
MAC Address             : 00:01:02:03:04:05
SNMP Community String   : public
IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x08
BMC ARP Control         : ARP Responses Enabled, Gratuitous ARP Disabled
Gratituous ARP Intrvl   : 2.0 seconds
Default Gateway IP      : 1.2.3.250
Default Gateway MAC     : 00:00:00:00:00:00
Backup Gateway IP       : 0.0.0.0
Backup Gateway MAC      : 00:00:00:00:00:00
802.1q VLAN ID          : Disabled
802.1q VLAN Priority    : 0
RMCP+ Cipher Suites     : 0,0,0
Cipher Suite Priv Max   : uaaaXXXXXXXXXXX
                        :     X=Cipher Suite Unused
                        :     c=CALLBACK
                        :     u=USER
                        :     o=OPERATOR
                        :     a=ADMIN
                        :     O=OEM

 

However, this doesn’t cover (or display in these settings…) the shared/dedicated setting for the BMC port.

You can find that by running this “raw” ipmitool command:

# ipmitool raw 0x34 0x14
 01

..where 01 means dedicated and 00 means shared.  (In this example we’re obviously already set to dedicated as this is after the fact)

In our case we want dedicated, which is set with this “raw” command:

# ipmitool raw 0x34 0x13 0x01
 01

Then the status command should show 01 as above and the dedicated BMC port will be in use.

 

Then go ahead and reset the BMC with this command:

# ipmitool mc reset cold
Sent cold reset command to MC

This will take a couple of minutes before the BMC is contactable again, but then it should be using the dedicated interface rather than shared, and you can go about your business again, huzzah!

 

Other possibly-useful ipmitool commands

 

References:

Dell servers, warranty facts and refresh-mcollective-metadata

On our physical Dell servers we install the dell-omsa packages which give us the ability to monitor our underlying hardware.

With that in place, you can use facter to report on all sorts of useful things about the hardware, including the state of the warranty.

The fact which checks warranty information, uses dell-omsa to pull the service tag of the server and submits it to Dells API – which then returns info about the status of your warranty.

You can then use mcollective to report on this. This can be really useful if you can’t remember what you bought when!

Unfortunately, from time to time it breaks and we start getting cronjob output which looks like this:

/usr/libexec/mcollective/refresh-mcollective-metadata
Could not retrieve fact='warranty_end', resolution='': undefined method `[]' for nil:NilClass
Could not retrieve fact='warranty_days_left', resolution='': can't dup NilClass
Could not retrieve fact='warranty_start', resolution='': undefined method `[]' for nil:NilClass
Could not retrieve fact='warranty_end', resolution='': undefined method `[]' for nil:NilClass

This happens just frequently enough to be a familiar problem for us, but not frequently enough for the fix to stick in my mind!

Googling for the error messages yeilds a couple of mailinglist threads asking about this error and how to work around it – which were both started by my colleague Jonathan Gazeley the first time we hit the problem. [1]

There are no actual fixes in those threads, although one post did hint at the root cause being mcollective caching the result of the Dell API call – without actually stating where it gets cached.

So, it’s strace time!

sudo strace -e open /usr/libexec/mcollective/refresh-mcollective-metadata 2>&1 | less

Skip to the end, and page back until you get to the bit where it starts complaining about the warranty fact, and you find that it’s trying to open /var/tmp/dell-warranty-XXXXXXX.json where XXXXXXX is the service tag of the hardware.

...
open("/var/tmp/dell-warranty-XXXXXXX.json", O_RDONLY) = 3
Could not retrieve fact='warranty_end', resolution='': undefined method `[]' for nil:NilClass
Could not retrieve fact='warranty_days_left', resolution='': can't dup NilClass
...

In our most recent case, the contents of that file looked like this:

$ cat /var/tmp/dell-warranty-XXXXXXX.json 
{
  "GetAssetWarrantyResponse": {
    "GetAssetWarrantyResult": {
      "Response": null,
      "Faults": {
        "FaultException": {
          "Message": "The tag you sent is not present. Check your separator character and ensure it is |.",
          "Code": 4001
        }
      }
    }
  }
}

That looks a lot to me like the API call failed for some reason.

The fix is to remove that stale cache file and re-run the mcollective-refresh-metadata script.

$ sudo rm /var/tmp/dell-warranty-*.json
$ sudo /usr/libexec/mcollective/refresh-mcollective-metadata

Then inspect the cached file again. It should now contain a lot of warranty info.

If it doesn’t, well… then you need to start working out why, and that’s an exercise left for the reader!

-Paul
[1] https://groups.google.com/forum/#!msg/puppet-users/LsK3HbEBMGc/-DSIOMNCDzIJ

I freely admit that the intent behind this post is mostly about getting the “fix” into those google search results – so I don’t have to resort to strace next time it happens!