Debug Queries in CoreDNS with the etcd Middleware

May 22, 2016

dns

Let’s say you have some data in etcd and use CoreDNS for service discovery. The Corefile looks like this:

.:53 {
    etcd skydns.local {
        stubzones
        path /skydns
        endpoint http://localhost:2379
        upstream 8.8.8.8:53 8.8.4.4:53
        debug  # <-- new, purpose of this blog
    }
}

You test with dig and you get the result below and you’re asking yourself wth is this happening? If you have access to etcd directly you can use etcdctl, if not you’re basically stuck.

% dig @localhost TXT production.*.skydns.local

; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;production.*.skydns.local.	IN	TXT

;; AUTHORITY SECTION:
skydns.local.		300	IN	SOA	ns.dns.skydns.local. hostmaster.skydns.local. 1463949825 7200 1800 86400 60

But not anymore! With the debug directive you enable debug queries, which means CoreDNS will respond with extra information about the data in etcd in the reply. Just prefix the original query with o-o.debug., in the above example: ask for o-o.debug.production.*.skydns.local. 1

% dig @localhost TXT o-o.debug.production.*.skydns.local

;; Warning: Message parser reports malformed message packet.
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;o-o.debug.production.*.skydns.local. IN	TXT

;; AUTHORITY SECTION:
skydns.local.		300	IN	SOA	ns.dns.skydns.local. hostmaster.skydns.local. 1463950211 7200 1800 86400 60

;; ADDITIONAL SECTION:
1.rails.production.east.skydns.local 300 CH TXT	"service1.example.com:8080(10,0,,false)[0,]"
2.rails.production.west.skydns.local 300 CH TXT	"service2.example.com:8080(10,0,,false)[0,]"

Those two TXT records in the additional section tell you what data CoreDNS considered before returning a reply. Those records are encoded as follows:

  1. owner name: the complete key in etcd for this Service. And then in the TXT record:
  2. host name:port, the host name and port as found.
  3. the weight and priority (0 if not set).
  4. if this (can be) is a TXT records the first 200 bytes of this text.
  5. can this be a MX record? Usually this is false.
  6. target strip value, usually 0 (not set).
  7. group id, usually empty (not set).

So in this case the owner name(s) exist, but the TXT entry (#4) is empty. So no NXDOMAIN, but NODATA because CoreDNS can’t return a TXT for this question.

Thanks to isomer@ for the initial idea of adding this to CoreDNS (and maybe later SkyDNS).

Update

Malformed entries are of course also a problem. You get an NXDOMAIN or NODATA (key is there, but there is something wrong with the JSON content). CoreDNS will now return the unmarshal error. Here we add a Service where the IP address is not quoted, etcd will happily register this for us, but when querying CoreDNS you’ll get the SERVFAIL:

% curl -XPUT http://127.0.0.1:4001/v2/keys/skydns/local/skydns/west/production/rails \
    -d value='{"host":127.0.0.2,"priority":20}'

% dig @localhost production.west.skydns.local

;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 56804
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
% dig @localhost o-o.debug.production.west.skydns.local

...
;; ADDITIONAL SECTION:
.                       160     CH      TXT
    "/skydns/local/skydns/west/production/rails: invalid character '.' after object key:value pair"

Ah, something wrong with the JSON located a /local/skydns/west/production/rails, a.k.a. rail.production.west.skydns.local.

This should make debugging “silent” responses from CoreDNS a lot easier.


  1. Note the “malformed message packet” is probable due to adding more information in a NODATA reply, which technically is not correct. [return]
coredns  etcd  skydns  debug