DNS Internals: delegating a subdomain to a server listening on a non-standard port

I’m writing this up because it took me quite some time to get my head around how to do this, and I found answers around the internet varying from “not possible” through to “try this” (which didn’t work) and “switch off this security feature you really like having” (no)

I found a way to make it happen, but it’s not easy. I’ll walk you through the problem, and how each way I attempted to solve it failed.

All the names below are hypotheticals, and for the sake of argument we’re trying to make “foo.subdomain.local” resolve via the additional server.

Problem:
Suppose you have two DNS servers. One which we’ll call “NS1” and one which we’ll call “NS-NEW”.

  • NS1 is a recursive server running bind, which all your clients point at to get their DNS information. It’s listening on port 53 as standard.
  • NS-NEW is an authoritative server which is listening on a non-standard port (8600) and for these purposes it’s a black box, we can’t change its behaviour.

You want your clients to be able to resolve the names that NS-NEW is authoritative for, but you don’t want to have to reconfigure the clients. So NS1 needs to know to pass those queries on to NS-NEW to get an answer.

Attempt 1 – “slave zone”
My first thought was to configure NS1 to slave the zone from NS-NEW.

zone "subdomain.local" {
        type slave;
        file "/var/named/slave/priv.zone";
        masters { $IP_OF_NS-NEW port 8600; };
};

This didn’t work for me because NS-NEW isn’t capable of doing zone transfers. Pity, as that would have been really neat and easy to manage!

Attempt 2 – “forward zone”
Then I tried forwarding queries from NS1 to NS-NEW, by using binds “forward zone” features.

zone "subdomain.local" {
        type forward;
        forward only;
        forwarders { $IP_OF_NS-NEW port 8600; };
};

This didn’t work because NS1 is configured to check for valid DNSSEC signatures. The root zone says that all its children are signed, and bind takes that to mean that all the grandchildren of the root should be signed as well.

The software running on NS-NEW isn’t capable of signing its zone information.

It doesn’t appear to be possible to selectively turn off DNSSEC checking on a per-zone basis, and I didn’t want to turn that off for our whole infrastructure as DNSSEC is generally a Good Thing.

Attempt 3 – “delegation”
I did think I could probably work around it by making NS1 authoritative for the “local.” top level domain, then using NS records in the zonefile for “local.” to directly delegate the zone to NS-NEW.

Something like this:

$TTL 86400	; default TTL for this zone
$ORIGIN local.
@       IN  SOA  NS1.my.domain. hostmaster.my.domain. (
                     2016031766 ; serial number
                     28800      ; refresh
                     7200       ; retry
                     604800     ; expire
                     3600       ; minimum
                     )
        IN  NS  NS1.my.domain.

; delegated zones
subdomain  IN  NS NS-NEW.my.domain.

Unfortunately that doesn’t work either, as it’s not possible to specify a port number in an NS record, and NS-NEW isn’t listening on a standard port.

Attempt 3 – “a little of option 2 and a little of option 3”
Hold on to your hats, this gets a little self referential.

I made NS1 authoritative for “local.”

zone "local" {
        type master;
        file "/var/named/data/zone.local";
};

I configured NS records in the “local.” zone file, which point back at NS1

$TTL 86400	; default TTL for this zone
$ORIGIN local.
@       IN  SOA  NS1.my.domain. hostmaster.my.domain. (
                     2016031766 ; serial number
                     28800      ; refresh
                     7200       ; retry
                     604800     ; expire
                     3600       ; minimum
                     )
        IN  NS  NS1.my.domain.

; delegated zones
subdomain  IN  NS NS1.my.domain.

I then configured a “subdomain.local.” forward zone on NS1 which forwards queries on to NS-NEW

zone "subdomain.local" {
        type forward;
        forward only;
        forwarders { $IP_OF_NS-NEW port 8600; };
};

To understand why this works, you need to understand how the recursion process for a query like “foo.subdomain.local.” happens.

When the query comes in NS1 does this:
– do I already know the answer from a previously cached query? Let’s assume no for now.
– do I know which DNS server is responsible for “subdomain.local.” from a previously cached query? Lets assume no for now.
– do I know which DNS server is responsible for “local.” – ooh! Yes! That’s me!
– now I can look in the zone file for “local.” and look to see how I resolve “subdomain.local.” – there’s an NS record which says I should ask NS1 in an authoritative way.
– now I ask NS1 for an answer to “foo.subdomain.local.”
– NS1 can then forward my query off to NS-NEW and fetch an answer.

Because we haven’t had to go all the way up to the root to get our answer, we avoid encountering the DNSSEC issue for this zone.

Did you really do it like *that*?
Yes and no.

The above is a simplified version of what I actually had to do, as our production equivalent of NS1 isn’t a single server – and I had to take account of our zone file management process, and all of that adds complexity which I don’t need to go into.

There are also a few extra hoops to jump through to make sure that the “local.” domain can only be accessed by clients on our network, and to make sure that our authoritative infrastructure doesn’t “leak” the “local.” zone to the outside world.

What would you have liked to have done?
If NS-NEW was able to listen on a standard port, I’d have used a straight delegation to do it.

If NS-NEW was able to sign it’s zone data with DNSSEC, I’d have used a simple forward zone to do it.

NS-NEW isn’t *quite* the black box I treated it as in this article, but the restriction about not being able to make it listen on port 53 is a real one.

The software running on NS-NEW does have a feature request in it’s issue tracker for DNSSEC, which I’ll watch with interest – as that would allow me to tidy up our config and might actually enable some other cool stuff further down the line…

Capacity Planning for DNS

I’ve spent the last 6 months working on our DNS infrastructure, wrangling it into a more modern shape.

This is the first in a series of articles talking about some of the process we’ve been through and outlining some of the improvements we’ve made.

One of the exercises we try to go through when designing any new production infrastructure is capacity planning. There are four questions you need to be able to ask when you’re doing this:

  1. How much traffic do we need to handle today?
  2. How are we expecting traffic to grow?
  3. How much traffic can the infrastructure handle?
  4. How much headroom have we got?

We aim to be in a position where we can ask those four questions on a regular basis, and preferably get useful answers to them!

When it comes to DNS, the most useful metric would appear to be “queries/second” (which I’ll refer to as qps from here on in to save a load of typing!) and bind can give us that information fairly readily with it’s built in statistics gathering features.

With that in mind, lets look at those 4 questions.

1. How much traffic do we need to handle today?
The best way to get hold of that information is to collect the qpsĀ metrics from our DNS infrastructure and graph them.

This is quite a popular thing to do and most monitoring tools (eg nagios, munin or ganglia) have well worn solutions available, and for everything else there’s google

Unfortunately we weren’t able to collate these stats from the core of the legacy DNS infrastructure in a meaningful way (due to differences in bind version, lack of a sensible aggregation point etc)

Instead, we had to infer it from other sources that we can/do monitor, for example the caching resolvers we use for eduroam.

Our eduroam wireless network is used by over 30,000 client devices a week. We think this around 60% of the total devices on the network, so it’s a fairly good proxy for the whole university network.

We looked at what the eduroam resolvers were handling at peak time (revision season), doubled it and added a bit. Not a particularly scientific approach, but it’s likely to be over-generous which is no bad thing in this case!

That gave us a ballpark figure of “we need to handle around 4000qps”

2. How are we expecting traffic to grow?
We don’t really have long term trend information for the central DNS service due to the historical lack of monitoring.

Again inferring generalities from eduroam, the number of clients on the network goes up by 20-30% year on year (and has done since 2011) Taking 30% growth year on year as our growth rate, and expanding that over 5 years it looks like this:

dns growth

Or in 5 years time we think we’ll need around 15,000qps.

All estimates in this process being on the generous side, and due to the compound nature of the year-on-year growth calculations, that should be a significant overestimate.

It will certainly be an interesting figure to revisit in 5 years time!

3. How much traffic can the infrastructure handle?
To answer this one, we need some benchmarking tools. After a bit of research I settled on dnsperf. The mechanics of how to run dnsperf (and how to gather a realistic sample dataset) are best left for another time.

All tests were done against the pre-production infrastructure so as not to interfere with live traffic.

Lets look at the graphs we get out at the end.

The new infrastructure:
20150624-1225.rate

Interpreting this graph isn’t immediately obvious. The way dnsperf works is that it linearly scales the number of queries/second that it’s sending to your DNS server, and tracks how many responses it gets back per second.

So the red line is how many queries/second we’re testing against, and the green line is how the server is responding. Where the two lines diverge shows you where your infrastructure starts to struggle.

In this case, the new infrastructure appears to cope quite well with around 30,000qps – or about twice what we’re expecting to need in 5 years time. That’s with all (or rather, both!) the servers in the pool available, so do we still have n+1 redundancy?

A single node in the new infrastructure:
20150622-1438.rate

From this graph you can see we’re good up to around 14000qps, so we’re n+1 redundant for at least the next 3-4 years (the lifetime of the harware we’re using)

At present, we have 2 nodes in the pool, the implication from the two graphs is that it does indeed scale approximately linearly with the number of servers in the pool.

4. How much headroom have we got?
At this point, the answer to that looks like “plenty” and with the new infrastructure we should be able to scale out almost linearly by adding more servers to the pool.

Now that we know how much we can expect our infrastructure to handle, and how much it’s actually experiencing, we can make informed decisions about when we need add more resources in order to maintain at least n+1 redundancy.

What about the legacy infrastructure?
Well, the reason I’m writing this post today (rather than any other day) is that we retired the oldest of the servers in the legacy infrastructure today, and I wanted to fire dnsperf at it, after it’s stopped handling live traffic but before we switch it off completely!

So how many queries/second can a 2005 vintage Sun Fire V240 server cope with?

20150727-0938.rate

It seems the answer to that is “not really enough for 2015!”

No wonder its response times were atrocious…

What’s talking to my DNS server?

I currently look after 6 DNS resolvers, which are used to provide DNS two large sections of the university network.

While they’re all running BIND, there’s a real mix of versions so I want to consolidate them down, retire the oldest pair and generally tidy up.

The two oldest are in an IP address range that I can’t easily move onto any of the other servers, so any clients using them need to be pointed at the newer servers before I can retire them.

The problem is, I’m not 100% sure which clients are using them.

So, how do you find out what machines are using a given DNS server?
The temptation is to turn on querylogging, and then look at the log files after a week. There are two problems with that, the first is a technical issue (querylogging appears to block responses while waiting for the logging IO to happen, slows your DNS server down significantly and sends the load average through the roof) but the second (more important) problem is a privacy issue.

Querylogging logs every DNS request made by your clients, as well as the IP the client machine is using. Taken together, those two pieces of information are enough to identify what a given user is looking at on the internet. That’s a pretty serious breach of privacy, needs signoff from on-high and is generally a world of paperwork and hastle which is best avoided.

So we need an approach which is lightweight, and only identifies the machines which are talking to us and not what they’re asking for. Gathering the minimum of data required to allow us to make the decision we’re interested in.

On Linux, that’s pretty easy to arrange with a bit of tcpdump and some good old awk:

#!/bin/bash
/usr/bin/nohup /usr/sbin/tcpdump -ln port 53 2>&1 | awk '/\.domain:/ { print $3; system("") }' | awk -F'.' '{ print $1"."$2"."$3"."$4; system("") }' > /tmp/dns_clients.log &

What does that do then?
It’s quite a long pipeline, so if we split it into chunks, it looks a bit like this:

/usr/bin/nohup

nohup combined with the ‘&’ at the end runs the whole pipeline in such a way that it doesn’t die when you log out. Which is useful if you want to set something running and then come back to it later.

/usr/sbin/tcpdump -ln port 53 2>&1

We then use tcpdump to dump all traffic on port 53 (DNS) The -n tells it to not resolve any IP addresses to dns names, and the -l changes the STDOUT buffering to be line buffered. This is helpful if you want to be able to see clients hitting your box in real time, for example to make sure the script is doing what you’re expecting it to.

“2>&1” merges STDERR and STDOUT into a single STDOUT stream.

awk '/\.domain:/ { print $3; system("") }'

Next we awk to look for lines which contain the string “.domain:” these are the ones which are headed towards our DNS server, rather than outbound requests made by processes running on the server. “print $3” prints the third field on the line (which is the source IP address/port of the request)

system(“”) is a useful bodge which makes the output from awk line buffered, in the same way as we used -l for tcpdump.

awk -F'.' '{ print $1"."$2"."$3"."$4; system("") }'

A bit more awk to strip off the port number, leaving us with just the IPv4 address. The observant amongst you will notice that this does nothing for IPv6 addresses. I should probably come up with a better solution, but this is sufficient for the boxes I’m currently interested in.

> /tmp/dns_clients.log &

redirects all the output into the /tmp/dns_clients.log file.

Handling the log file
So set that running on a DNS server, and client IP addresses will be logged to /tmp/dns_clients.log without logging what they’re looking at or slowing down the server too much. However, it logs one line for every request so it can get large quite quickly on a busy server.

I’ve been rotating that logfile out daily and running it through “sort -u” to get a list of unique clients. Once I’ve tracked them all down and pointed them at a more appropriate DNS server I’ll be able to retire the old ones.

Win!

Solaris Bonus Round
If you need to do this on solaris, tcpdump isn’t generally available (unless you’re willing to build it yourself) but /usr/sbin/snoop has roughly equivalent functionality. The output format is a little different, although it’s a bit easier to handle.

#!/usr/bin/bash
/usr/bin/nohup /usr/sbin/snoop -r dst port 53 | awk '{ print $1 }' > /tmp/dns_clients.log &

That does broadly the right thing. It includes traffic originated by the server as well as traffic hitting it from clients (so you could also pipe it through “grep -v $MY_IP” to weed that out if you wanted)