This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Documentation

welcome to dnsmonster documentation!

The documentation is broken down into different sections, Getting Started focuses on installation and post installation work like compiling dnsmonster from source, setting up services, shell complations and more. Configuration gets into details of how to configure dnsmonster, and how to identify and solve potential performance bottlenecks. The majority of your configuration is done inside the Input and Output sections.

You’ll learn where can you put filter on incoming traffic, sample inputs, mask IP addresses before even passing the packets on processor. After process, you’ll be able to exclude certain FQDNs from being sent to output, or include certain domains to be logged.

All above will generate a ton of useful metrics for your DNS infrastructure. dnsmonster has a builtin metrics system that can integrate to your favourite metrics aggregator like prometheus or statsd.

1 - Getting Started

Getting Started with dnsmonster

Passive DNS monitoring framework built on Golang. dnsmonster implements a packet sniffer for DNS traffic. It Ability to accept traffic from a pcap file, a live interface or a dnstap socket, and Ability to be used to index and store hundreds of thousands of DNS queries per second as it has shown to be capable of indexing 200k+ DNS queries per second on a commodity computer. It aims to be scalable, simple and easy to use, and help security teams to understand the details about an enterprise’s DNS traffic. dnsmonster doesn’t look to follow DNS conversations, rather it aims to index DNS packets as soon as they come in. It also doesn’t aim to breach the privacy of the end-users, with the ability to mask Layer 3 IPs (IPv4 and IPv6), enabling teams to perform trend analysis on aggregated data without being able to trace back the queries to an individual. Blogpost

Main features

  • Ability to use Linux’s afpacket and zero-copy packet capture.
  • Supports BPF
  • Ability to mask IP address to enhance privacy
  • Ability to have a pre-processing sampling ratio
  • Ability to have a list of “skip” fqdns to avoid writing some domains/suffix/prefix to storage
  • Ability to have a list of “allow” domains, used to log access to certain domains
  • Hot-reload of skip and allow domain files/urls
  • Modular output with configurable logic per output stream.
  • Automatic data retention policy using ClickHouse’s TTL attribute
  • Built-in Grafana dashboard for ClickHouse output.
  • Ability to be shipped as a single, statically linked binary
  • Ability to be configured using environment variables, command line options or configuration file
  • Ability to sample outputs using ClickHouse’s SAMPLE capability
  • Ability to send metrics using prometheus and statstd
  • High compression ratio thanks to ClickHouse’s built-in LZ4 storage
  • Supports DNS Over TCP, Fragmented DNS (udp/tcp) and IPv6
  • Supports dnstrap over Unix socket or TCP
  • built-in SIEM integration with Splunk and Microsoft Sentinel

1.1 - installation

Learn how to install dnsmonster on your platform using Docker, prebuilt binaries, or compiling it from the source on any platform Go supports

dnsmonster has been built with minimum dependencies. In runtime, the only optional dependency for dnsmonster is libpcap. By building dnsmonster without libpcap, you will lose the ability to set bpf filters on your live packet captures.

installation methods

Prebuilt binaries

Each relase of dnsmonster will ship with two binaries. One for Linux amd64, built statically against an Alpine based image, and one for Windows amd64, which depends on a capture library to be installed on the OS. I’ve tested thw Windows binary with the latest version of Wireshark installed on the system and there was no issues to run the executable.

Prebuilt packages

Per each release, the statically-linked binary mentioned above is also wrapped into deb and rpm packages with no dependencies, making it easy to deploy it in Debian and RHEL based distributions. Note that the packages don’t generate any service files or configuration templates at installation time.

Run as a container

The container build process only generates a Linux amd64 output. Since dnsmonster uses raw packet capture funcationality, Docker/Podman daemon must grant the capability to the container

sudo docker run --rm -it --net=host --cap-add NET_RAW --cap-add NET_ADMIN --name dnsmonster ghcr.io/mosajjal/dnsmonster:latest --devName lo --stdoutOutputType=1

Check out the configuration section to understand the provided command line arguments.

Build from the source

  • with libpcap: Make sure you have go, libpcap-devel and linux-headers packages installed. The name of the packages might differ based on your distribution. After this, simply clone the repository and run go build .
git clone https://github.com/mosajjal/dnsmonster --depth 1 /tmp/dnsmonster 
cd /tmp/dnsmonster
go get
go build -o dnsmonster .
  • without libpcap: dnsmonster only uses one function from libpcap, and that’s converting the tcpdump-style filters into BPF bytecode. If you can live with no BPF support, you can build dnsmonster without libpcap. Note that for any other platform, the packet capture falls back to libpcap so it becomes a hard dependency (*BSD, Windows, Darwin)
git clone https://github.com/mosajjal/dnsmonster --depth 1 /tmp/dnsmonster 
cd /tmp/dnsmonster
go get
go build -o dnsmonster -tags nolibpcap .

The above build also works on ARMv7 (RPi4) and AArch64.

Build Statically

If you have a copy of libpcap.a, you can build the statically link it to dnsmonster and build it fully statically. In the code below, please change /root/libpcap-1.9.1/libpcap.a to the location of your copy.

git clone https://github.com/mosajjal/dnsmonster --depth 1 /tmp/dnsmonster
cd /tmp/dnsmonster/
go get
go build --ldflags "-L /root/libpcap-1.9.1/libpcap.a -linkmode external -extldflags \"-I/usr/include/libnl3 -lnl-genl-3 -lnl-3 -static\"" -a -o dnsmonster

For more information on how the statically linked binary is created, take a look at Dockerfiles in the root of the repository responsible for generating the published binaries.

1.2 - post-installation

Set up services and shell completions for dnsmonster

Post-install

After you install dnsmonster, you might need to take a few extra steps to build services so dnsmonster runs automatically on system startup. These steps aren’t included in the installation process by default.

Systemd service

If you’re using a modern and popular distro like Debian, Ubuntu, Fedora, Arch, RHEL, you’re probably using systemd as your init system. To have dnsmonster as a service, created a file in /etc/systemd/system/ named dnsmonster.service, and define your systemd unit there. The name dnsmonster as a service name is totally optional.

cat > /etc/systemd/system/dnsmonster.service << EOF
[Unit]
Description=Dnsmonster Service
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
Restart=always
RestartSec=3
ExecStart=/sbin/dnsmonster --config /etc/dnsmonster.ini

[Install]
WantedBy=multi-user.target

EOF

The above systemd service looks at /etc/dnsmonster.ini as a configuration file. Checkout the configuration section to see how that configuration file is generated.

to start the service and ebable it at boot time, run the following

sudo systemctl enable --now dnsmonster.service

You can also build a systemd service that takes the interface name dynamically and runs the dnsmonster instance per interface. To do so, create a service unit like this:

cat > /etc/systemd/system/[email protected] << EOF
[Unit]
Description=Dnsmonster Service
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
Restart=always
RestartSec=3
ExecStart=/sbin/dnsmonste --devName=%i --config /etc/dnsmonster.ini

[Install]
WantedBy=multi-user.target

EOF

The above unit creates a dynamic systemd service that can be enabled for multiple Interfaces. For example, to run the service for the loopback interface in linux (lo), run the following:

sudo systemctl enable --now [email protected]

Note that the above example only works if you’re not specifying a dnstap or a local pcap file as an input inside the configuration file.

init.d service

bash and fish completion

2 - Configuration

Learn about the command line arguments and what they mean

to run dnsmonster, one input and at least one output must be defined. The input could be any of devName for live packet capture, pcapFile to read off a pcap file, or dnstapSocket address to listen to. Currently, running dnsmonster with more than one input stream at a time isn’t supported. For output however, it’s supported to have more than one channel. Sometimes, it’s also possible to have multiple instances of the same output (for example Splunk) to provide load balancing and high availability.

Note that in case of specifying multiple output streams, the output data is copied to all. For example, if you put stdoutOutputType=1 and --fileOutputType=1 --fileOutputPath=/dev/stdout, you’ll see each processed output twice in your stdout. One coming from the stdout output type, and the other from the file output type which happens to have the same address (/dev/stdout).

dnsmonster can be configured in 3 different ways. Command line options, Environment variables and a configuration file. You can also use any combination of them at the same time. The precedence order is as follows:

  • Command line options (Case-insensitive)
  • Environment variables (Always upper-case)
  • Configuration file (Case-sensitive, lowercase)
  • Default values (No configuration)

For example, if you have a configuration file that has specified a devName, but you also provide it as a command-line argument, dnsmonster will prioritizes CLI over config file and will ignore that parameter from the ini file.

Command line options

To see the current list of command-line options, run dnsmonster --help or checkout the repository’s README.md.

Environment variables

all the flags can also be set via env variables. Keep in mind that the name of each parameter is always all upper case and the prefix for all the variables is “DNSMONSTER.” Example:

$ export DNSMONSTER_PORT=53
$ export DNSMONSTER_DEVNAME=lo
$ sudo -E dnsmonster

Configuration file

you can run dnsmonster using the following command to use configuration file:

$ sudo dnsmonster --config=dnsmonster.ini

# Or you can use environment variables to set the configuration file path
$ export DNSMONSTER_CONFIG=dnsmonster.ini
$ sudo -E dnsmonster

2.1 - Performance

Performance considerations when configuring dnsmonster

Use afpacket

If you’re using dnsmonster as a sniffer, and you’re not keeping up with the number of packets that are coming in, consider switching on afpacket by using the flag --useAfpacket. Afpacket tends to drastically improve packet ingestion rate of dnsmonster. If you still having packet drop issues, increase --afpacketBuffersizeMb to a higher value. the buffer size will take up more memory on startup, and will increase the startup time depending how much have you assigned to it.

In some tests, values above 4096MB tend to have negative impact on the overall performance of the daemon. If you’re using 4096MB of buffer size and still seeing performance issues, There’s a good chance the issue isn’t on the capture size, and more on the process and output side.

Proper Output and packet handlers

Simply put, if you have an output that can accept 1000 inserts per second, but you have an incoming packet rate of 10,000 packets per second, you’re going to see a lot of packet drops. The packet drop will get worse and worse as time goes by as well. When selecting an output, consider the capacity of your technology and what you expect to be ingested.

If you are seeing a considerable amount of packet loss which gets worse as time goes on, consider testing --stdoutOutputType=1 and remove your current output, and redirect the output to /dev/null. You can also tweak the number of workers converting the data to JSON to further experiment with it. Take the following example

dnsmonster --devName=lo --packetHandlerCount=16 --stdoutOutputType=1 --useAfpacket | pv --rate --line-mode > /dev/null

In above command, you can see the exact output line per second while maintaining a view on metrics and packet loss to see if your packet loss is still present. by default, --stdoutOutputWorkerCount is set to 8. If you have a strong enough CPU, you can increase that amount to see what’s the max rate you can achieve. On a small server, you shouldn’t have a problem ingesting 500k packet per second.

Note that the --packetHandlerCount is also set to 16 to make sure enough workers are ingesting packets coming in. That’s also an important parameter to tweak to achieve the optimum performance. The default, 2, might be too low for you if you have hundreds of thousands of packets per second on an interface.

Sampling and BPF-based split of traffic

Sometimes, the packets are simply too much to process. dnsmonster offers a few options to deal with this problem. --sampleRatio simply ignores packets by the defined ratio. default is 1:1, meaning for each incoming packet, one gets processed, aka 100%. you can tweak this number if your hardware isn’t capable of processing all the packets, or dnsmonster has simply reached its limit.

For example, putting 2:7 as your sample ratio means for each 7 packets that come in, only the first two get processed.

If after testing all options you’ve reached the conclusion that dnsmonster can not handle more than what you need it to do, please raise an issue about it, but also you can run multiple instances of dnsmonster looking at the same traffic like so:

dnsmonster --devName=lo --stdoutOutputType=1 --filter="src portrange 1024-32000"
dnsmonster --devName=lo --stdoutOutputType=1 --filter="src portrange 32001-65535"

The above processes will split the traffic between them based on the port range. Note that only high ports are included since majority of the clients use ports above 1024 to conduct a DNS query. you can change the filter based on any BPF that makes sense for your environment.

Profile CPU and Memory

To take a look at what exactly is using your CPU and RAM, take a look at the Golang profiler tools available through --memprofile and --cpuprofile flags. to use them, issue the following

# profile CPU
dnsmonster --devName=lo --stdoutOutputType=1 --cpuprofile=1

# you'll see something like this in the beginning of your logs
# 2022/04/11 19:13:51 profile: cpu profiling enabled, /tmp/profile452510705/cpu.pprof

# profile RAM
dnsmonster --devName=lo --stdoutOutputType=1 --memprofile=1

# you'll see something like this in the beginning of your logs
# 2022/04/11 19:15:00 profile: memory profiling enabled (rate 4096), /tmp/profile1290716652/mem.pprof

After dnsmonster exits gracefully, you can use Go’s perf tools to open the generated pprof file in a browser and dig deep into functions that are being bottleneck in the code. After installing pprof, use it like below

~/go/bin/pprof -http 127.0.0.1:8882 /tmp/profile2392236212/mem.pprof

A browser session will automatically open with the performance metrics for your execution.

3 - Inputs and Filters

Set up an input to receive data

To get the raw data into dnsmonster pipeline, you must specify an input stream. Currently there are three supported Input methods:

  • Live interface
  • Pcap file
  • dnstap socket

The configuration for inputs and packet processing is contained within the capture section of the configuration:

  • --devName: Enables live capture mode on the device. Only one interface per dnsmonster instance is supported.

  • --pcapFile: Enables offline pcap mode. You can specify “-” as pcap file to read from stdin

  • --dnstapSocket: Enables dnstap mode. Accepts a socket path. Example: unix:///tmp/dnstap.sock, tcp://127.0.0.1:8080.

  • --port: Port selected to filter packets (default: 53). Works independently from BPF filter

  • --sampleRatio: Specifies packet sampling ratio at capture time. default is 1:1 meaning all packets passing the bpf will get processed.

  • --dedupCleanupInterval: In case –dedup is enabled, cleans up packet hash table used for it (default: 60s)

  • --dnstapPermission: Set the dnstap socket permission, only applicable when unix:// is used (default: 755)

  • --packetHandlerCount: Number of workers used to handle received packets (default: 2)

  • --tcpAssemblyChannelSize: Specifies the goroutine channel size for the TCP assembler. TCP assembler is used to de-fragment incoming fragmented TCP packets in a way that won’t slow down the process of “normal” UDP packets.

  • --tcpResultChannelSize: Size of the tcp result channel (default: 10000)

  • --tcpHandlerCount: Number of routines used to handle TCP DNS packets (default: 1)

  • --defraggerChannelSize: Size of the channel to send raw packets to be de-fragmented (default: 10000)

  • --defraggerChannelReturnSize: Size of the channel where the de-fragmented packets are sent to the output queue (default: 10000)

  • --packetChannelSize: Size of the packet handler channel (default: 1000)

  • --afpacketBuffersizeMb: Afpacket buffer size in MB (default: 64)

  • --filter: BPF filter applied to the packet stream.

  • --useAfpacket: Use this boolean flag to switch on afpacket sniff method on live interfaces

  • --noEtherframe: Use this boolean flag if the incoming packets (pcap file) do not contain the Ethernet frame

  • --dedup: Boolean flag to enable experimental de-duplication engine

  • --noPromiscuous: Boolean flag to prevent dnsmonster to automatically put the devName in promiscuous mode

Above flags are used in variety of ways. Check the Filters and Masks and inputs for more detailed info.

3.1 - Filters and masks

There are a few ways to manipulate incoming packets in various steps of dnsmonster pipeline. They operate in different levels of stack and have different performance implications.

BPF

BPF is by far the most performant way to filter incoming packets. It’s only supported on live capture (--devName). It uses the tcpdump’s pcap-filter language to filter out the packets. There are plans to potentially move away from this method and accept base64-encoded bpf bytecode in the future.

Sample Ratio

Sample ratio (--sampleRatio) is an easy way to reduce the number of packets being pushed to the pipeline purely by numbers. the default value is 1:1 meaning for each 1 incoming packet, 1 gets pushed to the pipeline. you can change that if you have a huge number of packets or your output is not catching up with the input. Checkout performance guide for more detail.

De-duplication

The experimental de-duplication (--dedup) feature is implemented to provide a rudimentary packet de-duplication capability. The functionality of de-duplication is very simple. It uses a non-cryptography hashing function (FNV-1) on the raw packets and generates a hash table of incoming packets as the come in. Note that the hashing function happens before stripping 802.1q, vxlan, ethernet layers so the de-duplication happens purely on the packet bytes.

There’s also the option --dedupCleanupInterval to specify cleanup time for the hash table. around the time of cleanup, there could be a few duplicate packets since the hash table is not time-bound on its own. It gets flushed completely at the interval.

Applied after Sample Ratio for each packet.

Port

There’s an additional filter specifying the port (--port) of each packet. since the vast majority of the DNS packets are served out of port 53, this parameter shouldn’t have any effect by default. note that this filter will not be applied to fragmented packets.

IP Masks

While processing the packets, the source and destination IPv4 and IPv6 packets can be masked by a specified number of bytes (--maskSize4 and --maskSize6 options). Since this step happens after de-duplication, there could be seemingly duplicate entries in the output purely because of the fact that IP prefixes appear the same.

Allow and Skip Domain list

These two filters specify an allowlist and a skip list for the domain outputs. --skipDomainsFile is used to avoid writing noisy, repetitive data to your Output. The skip domain list is a csv-formatted file (or a URL containing the file), with only two columns: a string representing part or all of a FQDN, and a logic for that particular string. dnsmonster supports three logics for each entry: prefix, suffix and fqdn. prefix and suffix means that only the domains starting/ending with the mentioned string will be skipped from being sent to output. Note that since the process is being done on DNS questions, your string will most likely have a trailing . that needs to be included in your skip list row as well (take a look at skipdomains.csv.sample for a better view). You can also have a full FQDN match to avoid writing highly noisy FQDNs into your database.

--allowDomainsFile provides the exact opposite of skip domain logic, meaning your output will be limited to the entries inside this list.

both --skipDomainsFile and --allowDomainsFile have an automatic refresh interval and re-fetch the FQDNs using --skipDomainsRefreshIntervaland --allowDomainsRefreshInterval options.

For each output type, you can specify which of these tables are used. Check the output section for more detail regarding the output modes.

3.2 - Input options

Let’s go through some examples of how to set up dnsmonster inputs

Live interface

To start listening on an interface, simply put the name of the interface in the --devName= parameter. In unix-like systems, the ip a command or ifconfig gives you a list of interfaces that you can use. In this mode, dnsmonster needs to run with higher privileges.

In Windows environments, to get a list of interfaces, open cmd.exe as Administrator and run the following: getmac.exe. You’ll see a table with your interfaces’ MAC address and a Transport Name column with something like this: \Device\Tcpip_{16000000-0000-0000-0000-145C4638064C}.

Then you simply replace the word Tcpip_ with NPF_ and use it as the --devName parameter. Like so

dnsmonster.exe --devName \Device\NPF_{16000000-0000-0000-0000-145C4638064C}

Pcap file

To analyze a pcap file, you can simply use the --pcapFile= option. You can also use the value - or /dev/stdin to read the pcap from stdin. This can be used in pcap-over-ip and zipped pcaps that you would like to analyze on the fly. For example, this example will read the packets as they’re getting extracted without saving the extracted pcap on the disk

lz4cat /path/to/a/hug/dns/capture.pcap.lz4 | dnsmonster --pcapFile=- --stdoutOutputType=1

Pcap-over-Ip

dnsmonster doesn’t support pcap-over-ip directly, but you can achieve the same results by combining a program like netcat or socat with dnsmonster to make pcap-over-ip work.

to connect to a remote pcap-over-ip server, use the following

while true; do
  nc -w 10 REMOTE_IP REMOTE_PORT | dnsmonster --pcapFile=- --stdoutOutputType=1
done

to listen on pcap-over-ip, the following code can be used

while true; do
  nc -l -p REMOTE_PORT | dnsmonster --pcapFile=- --stdoutOutputType=1
done

if pcap-over-ip is a popular enough option, the process of building a native capability to support it shouldn’t be too difficult. Feel free to open a topic in the discussion page or simply an issue on the repo if this is something you care about.

dnstap

dnsmonster can listen on a dnstap TCP or Unix socket and process the dnstap logs as they come in just like a network packet, since dnstap’s specification is very close to the packet itself. to learn more about dnstap, visit their website here.

to use dnstap as a TCP listener, use --dnstapSocket with a syntax like --dnstapSocket=tcp://0.0.0.0:5555. If you’re using a Unix socket to listen for dnstap packets, you can use unix:///tmp/dnstap.sock and set the socket file permission with --dnstapPermission option.

Currently, the dnstap in client mode is unsupported since the use case of it is very rare. in case you need this function, you can use a tcp port proxy or socat to convert the TCP connection into a unix socket and read it from dnsmonster.

4 - Outputs

Set up output(s) and gather metrics

dnsmonster follows a pipeline architecture for each individual packet. After the Capture and filter, each processed packet arrives at the output dispatcher. The dispatcher sends a copy of the output to each individual output module that have been configured to produce output. For instance, if you specify stdoutOutputType=1 and --fileOutputType=1 --fileOutputPath=/dev/stdout, you’ll see each processed output twice in your stdout. One coming from the stdout output type, and the other from the file output type which happens to have the same address (/dev/stdout).

In general, each output has its own configuration section. You can see the sections with “_output” suffix when running dnsmonster --help from the command line. The most important parameter for each output is their “Type”. Each output has 5 different types:

  • Type 0:
  • Type 1: An output module configured as Type 1 will ignore “SkipDomains” and “AllowDomains” and will generate output for all the incoming processed packets. Note that the output types does not nullify input filters since it is applied after capture and early packet filters. Take a look at Filters and Masks to see the order of the filters applied.
  • Type 2: An output module configured as Type 2 will ignore “AllowDomains” and only applies the “SkipDmains” logic to the incoming processed packets.
  • Type 3: An output module configured as Type 3 will ignore “SkipDmains” and only applies the “AllowDomains” logic to the incoming processed packets.
  • Type 4: An output module configured as Type 4 will apply both “SkipDmains” and “AllowDomains” logic to the incoming processed packets.

Other than Type, each output module may require additional configuration parameters. For more information, refer to each module’s documentation.

Output Formats

dnsmonster supports multiple output formats:

  • json: the standard JSON output. The output looks like below sample
{"Timestamp":"2020-08-08T00:19:42.567768Z","DNS":{"Id":54443,"Response":true,"Opcode":0,"Authoritative":false,"Truncated":false,"RecursionDesired":true,"RecursionAvailable":true,"Zero":false,"AuthenticatedData":false,"CheckingDisabled":false,"Rcode":0,"Question":[{"Name":"imap.gmail.com.","Qtype":1,"Qclass":1}],"Answer":[{"Hdr":{"Name":"imap.gmail.com.","Rrtype":1,"Class":1,"Ttl":242,"Rdlength":4},"A":"172.217.194.108"},{"Hdr":{"Name":"imap.gmail.com.","Rrtype":1,"Class":1,"Ttl":242,"Rdlength":4},"A":"172.217.194.109"}],"Ns":null,"Extra":null},"IPVersion":4,"SrcIP":"1.1.1.1","DstIP":"2.2.2.2","Protocol":"udp","PacketLength":64}
  • csv: the CSV output. The fields and headers are non-customizable at the moment. to get a custom output, please look at gotemplate.
Year,Month,Day,Hour,Minute,Second,Ns,Server,IpVersion,SrcIP,DstIP,Protocol,Qr,OpCode,Class,Type,ResponseCode,Question,Size,Edns0Present,DoBit,Id
2020,8,8,0,19,42,567768000,default,4,2050551041,2050598324,17,1,0,1,1,0,imap.gmail.com.,64,0,0,54443
  • csv_no_headers: Looks exactly like the CSV but with no header print at the beginning
  • gotemplate: Customizable template to come up with your own formatting. let’s look at a few examples with the same packet we’ve looked at using JSON and CSV
$ dnsmonster --pcapFile input.pcap --stdoutOutputType=1 --stdoutOutputFormat=gotemplate --stdoutOutputGoTemplate="timestamp=\"{{.Timestamp}}\" id={{.DNS.Id}} question={{(index .DNS.Question 0).Name}}"
timestamp="2020-08-08 00:19:42.567735 +0000 UTC" id=54443 question=imap.gmail.com.

Take a look at the official docs for more info regarding text/template and your various options.

4.1 - Apache Kafka

Possibly the most versatile output supported by dnsmonster. Kafka output allows you to connect to endless list of supported sinks. It is the recommended output module for enterprise designs since it offers fault tolerance and it can sustain outages to the sink. dnsmonster’s Kafka output supports compression, TLS, and multiple brokers. In order to provide multiple brokers, you need to specify it multiple times.

Configuration Parameters

[kafka_output]
; What should be written to kafka. options:
;	0: Disable Output
;	1: Enable Output without any filters
;	2: Enable Output and apply skipdomains logic
;	3: Enable Output and apply allowdomains logic
;	4: Enable Output and apply both skip and allow domains logic
KafkaOutputType = 0

; kafka broker address(es), example: 127.0.0.1:9092. Used if kafkaOutputType is not none
KafkaOutputBroker =

; Kafka topic for logging
KafkaOutputTopic = dnsmonster

; Minimum capacity of the cache array used to send data to Kafka
KafkaBatchSize = 1000

; Kafka connection timeout in seconds
KafkaTimeout = 3

; Interval between sending results to Kafka if Batch size is not filled
KafkaBatchDelay = 1s

; Compress Kafka connection
KafkaCompress = false

; Use TLS for kafka connection
KafkaSecure = false

; Path of CA certificate that signs Kafka broker certificate
KafkaCACertificatePath =

; Path of TLS certificate to present to broker
KafkaTLSCertificatePath =

; Path of TLS certificate key
KafkaTLSKeyPath =

4.2 - ClickHouse

ClickHouse is a time-series database engine developed by Yandex. It uses a column-oriented design which makes it a good candidate to store hundreds of thousands of DNS queries per second with extremely good compression ratio as well as fast retrieval of data.

Currently, dnsmonster’s implementation requires the table name to be set to DNS_LOG. An SQL schema file is provided by the repository under the clickhouse directory. The Grafana dashboard and configuration set provided by dnsmonster also corresponds with the ClickHouse schema and can be used to visualize the data.

configuration parameters

  • --clickhouseAddress: Address of the ClickHouse database to save the results (default: localhost:9000)
  • --clickhouseUsername: Username to connect to the ClickHouse database (default: empty)
  • --clickhousePassword: Password to connect to the ClickHouse database (default: empty)
  • --clickhouseDatabase: Database to connect to the ClickHouse database (default: default)
  • --clickhouseDelay: Interval between sending results to ClickHouse (default: 1s)
  • --clickhouseDebug: Debug ClickHouse connection (default: false)
  • --clickhouseCompress: Compress ClickHouse connection (default: false)
  • --clickhouseSecure: Use TLS for ClickHouse connection (default: false)
  • --clickhouseSaveFullQuery: Save full packet query and response in JSON format. (default: false)
  • --clickhouseOutputType: ClickHouse output type. Options: (default: 0)
    • 0: Disable Output
    • 1: Enable Output without any filters
    • 2: Enable Output and apply skipdomains logic
    • 3: Enable Output and apply allowdomains logic
    • 4: Enable Output and apply both skip and allow domains logic
  • --clickhouseBatchSize: Minimum capacity of the cache array used to send data to clickhouse. Set close to the queries per second received to prevent allocations (default: 100000)
  • --clickhouseWorkers: Number of ClickHouse output Workers (default: 1)
  • --clickhouseWorkerChannelSize: Channel Size for each ClickHouse Worker (default: 100000)

Note: the general option --skipTLSVerification applies to this module as well.

Retention Policy

The default retention policy for the ClickHouse tables is set to 30 days. You can change the number by building the containers using ./autobuild.sh. Since ClickHouse doesn’t have an internal timestamp, the TTL will look at incoming packet’s date in pcap files. So while importing old pcap files, ClickHouse may automatically start removing the data as they’re being written and you won’t see any actual data in your Grafana. To fix that, you can change TTL to a day older than your earliest packet inside the PCAP file.

NOTE: to manually change the TTL, you need to directly connect to the ClickHouse server using the clickhouse-client binary and run the following SQL statements (this example changes it from 30 to 90 days):

ALTER TABLE DNS_LOG MODIFY TTL DnsDate + INTERVAL 90 DAY;`

NOTE: The above command only changes TTL for the raw DNS log data, which is the majority of your capacity consumption. To make sure that you adjust the TTL for every single aggregation table, you can run the following:

ALTER TABLE DNS_LOG MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_DOMAIN_COUNT` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_DOMAIN_UNIQUE` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_PROTOCOL` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_GENERAL_AGGREGATIONS` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_EDNS` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_OPCODE` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_TYPE` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_CLASS` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_RESPONSECODE` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_SRCIP_MASK` MODIFY TTL DnsDate + INTERVAL 90 DAY;

UPDATE: in the latest version of clickhouse, the .inner tables don’t have the same name as the corresponding aggregation views. To modify the TTL you have to find the table names in UUID format using SHOW TABLES and repeat the ALTER command with those UUIDs.

SAMPLE in clickhouse SELECT queries

By default, the main tables created by tables.sql (DNS_LOG) file have the ability to sample down a result as needed, since each DNS question has a semi-unique UUID associated with it. For more information about SAMPLE queries in Clickhouse, please check out this document.

Useful queries

  • List of unique domains visited over the past 24 hours
-- using domain_count table
SELECT DISTINCT Question FROM DNS_DOMAIN_COUNT WHERE t > Now() - toIntervalHour(24)

-- only the number
SELECT count(DISTINCT Question) FROM DNS_DOMAIN_COUNT WHERE t > Now() - toIntervalHour(24)

-- see memory usage of the above query in bytes
SELECT memory_usage FROM system.query_log WHERE query_kind='Select' AND  arrayExists(x-> x='default.DNS_DOMAIN_COUNT', tables) ORDER BY event_time DESC LIMIT 1 format Vertical

-- you can also get the memory usage of each query by query ID. There should be only 1 result so we will cut it off at one to optimize performance
SELECT sum(memory_usage) FROM system.query_log WHERE initial_query_id = '8de8fe3c-d46a-4a32-83da-4f4ba4dc49e5' format Vertical

4.3 - Elasticsearch/OpenSearch

Elasticsearch is a full-text search engine and it’s used widely across a lot of security tools. dnsmonster supports Elastic 7.x out of the box. The support for 6.x and 8.x has not been tested.

There is also a fork of Elasticsearch called Opendistro, later renamed to Opensearch. Both are compatible with 7.10.x Elastic, so it should also be supported too.

Configuration parameters

[elastic_output]
; What should be written to elastic. options:
;	0: Disable Output
;	1: Enable Output without any filters
;	2: Enable Output and apply skipdomains logic
;	3: Enable Output and apply allowdomains logic
;	4: Enable Output and apply both skip and allow domains logic
ElasticOutputType = 0

; elastic endpoint address, example: http://127.0.0.1:9200. Used if elasticOutputType is not none
ElasticOutputEndpoint =

; elastic index
ElasticOutputIndex = default

; Send data to Elastic in batch sizes
ElasticBatchSize = 1000

; Interval between sending results to Elastic if Batch size is not filled
ElasticBatchDelay = 1s

4.4 - InfluxDB

InfluxDB is a time series database used to store logs and metrics with high ingestion rate.

Configuration options

[influx_output]
; What should be written to influx. options:
;	0: Disable Output
;	1: Enable Output without any filters
;	2: Enable Output and apply skipdomains logic
;	3: Enable Output and apply allowdomains logic
;	4: Enable Output and apply both skip and allow domains logic
InfluxOutputType = 0

; influx Server address, example: http://localhost:8086. Used if influxOutputType is not none
InfluxOutputServer =

; Influx Server Auth Token
InfluxOutputToken = dnsmonster

; Influx Server Bucket
InfluxOutputBucket = dnsmonster

; Influx Server Org
InfluxOutputOrg = dnsmonster

; Minimum capacity of the cache array used to send data to Influx
InfluxOutputWorkers = 8

; Minimum capacity of the cache array used to send data to Influx
InfluxBatchSize = 1000

4.5 - Microsoft Sentinel

Microsoft Sentinel output module is designed to send dnsmonster logs to Sentinel. In addition to that, this module supports sending the logs to any Log Analytics workspace no matter if they are connected to Sentinel or not.

Please take a look at Microsoft’s official documentation to see how Customer ID and Shared key are obtained.

Configuration Parameters

[sentinel_output]
; What should be written to Microsoft Sentinel. options:
;	0: Disable Output
;	1: Enable Output without any filters
;	2: Enable Output and apply skipdomains logic
;	3: Enable Output and apply allowdomains logic
;	4: Enable Output and apply both skip and allow domains logic
SentinelOutputType = 0

; Sentinel Shared Key, either the primary or secondary, can be found in Agents Management page under Log Analytics workspace
SentinelOutputSharedKey =

; Sentinel Customer Id. can be found in Agents Management page under Log Analytics workspace
SentinelOutputCustomerId =

; Sentinel Output LogType
SentinelOutputLogType = dnsmonster

; Sentinel Output Proxy in URI format
SentinelOutputProxy =

; Sentinel Batch Size
SentinelBatchSize = 100

; Interval between sending results to Sentinel if Batch size is not filled
SentinelBatchDelay = 1s

4.6 - Splunk HEC

Splunk HTTP Event Collector is a widely used component of Splunk to ingest raw and JSON data. dnsmonster uses the JSON output to push the logs into a Splunk index. various configurations are also supported. You can also use multiple HEC endpoints to have load balancing and fault tolerance across multiple index heads. Note that the token and other settings are shared between multiple endpoints.

Configuration Parameters

[splunk_output]
; What should be written to HEC. options:
;	0: Disable Output
;	1: Enable Output without any filters
;	2: Enable Output and apply skipdomains logic
;	3: Enable Output and apply allowdomains logic
;	4: Enable Output and apply both skip and allow domains logic
SplunkOutputType = 0

; splunk endpoint address, example: http://127.0.0.1:8088. Used if splunkOutputType is not none, can be specified multiple times for load balanace and HA
SplunkOutputEndpoint =

; Splunk HEC Token
SplunkOutputToken = 00000000-0000-0000-0000-000000000000

; Splunk Output Index
SplunkOutputIndex = temp

; Splunk Output Proxy in URI format
SplunkOutputProxy =

; Splunk Output Source
SplunkOutputSource = dnsmonster

; Splunk Output Sourcetype
SplunkOutputSourceType = json

; Send data to HEC in batch sizes
SplunkBatchSize = 1000

; Interval between sending results to HEC if Batch size is not filled
SplunkBatchDelay = 1s

4.7 - Stdout, syslog or Log File

Stdout, syslog and file are supported outputs for dnsmonster out of the box. They are useful specially if you have a SIEM agent reading the files as they come in. Note that dnsmonster does not provide support for log rotation and the capacity of the hard drive while writing into a file. You can use a tool like logrotate to perform cleanups on the log files. The signalling on log rotation (SIGHUP) has not been tested with dnsmonster.

Currently, Syslog output is only supported on Linux.

Configuration parameters

[file_output]
; What should be written to file. options:
;	0: Disable Output
;	1: Enable Output without any filters
;	2: Enable Output and apply skipdomains logic
;	3: Enable Output and apply allowdomains logic
;	4: Enable Output and apply both skip and allow domains logic
FileOutputType = 0

; Path to output file. Used if fileOutputType is not none
FileOutputPath =

; Output format for file. options:json,csv, csv_no_header, gotemplate. note that the csv splits the datetime format into multiple fields
FileOutputFormat = json

; Go Template to format the output as needed
FileOutputGoTemplate = {{.}}

[stdout_output]
; What should be written to stdout. options:
;	0: Disable Output
;	1: Enable Output without any filters
;	2: Enable Output and apply skipdomains logic
;	3: Enable Output and apply allowdomains logic
;	4: Enable Output and apply both skip and allow domains logic
StdoutOutputType = 0

; Output format for stdout. options:json,csv, csv_no_header, gotemplate. note that the csv splits the datetime format into multiple fields
StdoutOutputFormat = json

; Go Template to format the output as needed
StdoutOutputGoTemplate = {{.}}

; Number of workers
StdoutOutputWorkerCount = 8

[syslog_output]
; What should be written to Syslog server. options:
;	0: Disable Output
;	1: Enable Output without any filters
;	2: Enable Output and apply skipdomains logic
;	3: Enable Output and apply allowdomains logic
;	4: Enable Output and apply both skip and allow domains logic
SyslogOutputType = 0

; Syslog endpoint address, example: udp://127.0.0.1:514, tcp://127.0.0.1:514. Used if syslogOutputType is not none
SyslogOutputEndpoint = udp://127.0.0.1:514

4.8 - PostgreSQL

PostgreSQL is regarded as the world’s most advanced open source database. dnsmonster has experimental support to output to postgreSQL and any other compatible database engines (CockroachDB).

Configuration options


# [psql_output]
# What should be written to Microsoft Psql. options:
#	0: Disable Output
#	1: Enable Output without any filters
#	2: Enable Output and apply skipdomains logic
#	3: Enable Output and apply allowdomains logic
#	4: Enable Output and apply both skip and allow domains logic
--psqlOutputType=0

# Psql endpoint used. must be in uri format. example: postgres://username:[email protected]:port/database?sslmode=disable
--psqlEndpoint=

# Number of PSQL workers
--psqlWorkers=1

# Psql Batch Size
--psqlBatchSize=1

# Interval between sending results to Psql if Batch size is not filled. Any value larger than zero takes precedence over Batch Size
--psqlBatchDelay=0s

# Timeout for any INSERT operation before we consider them failed
--psqlBatchTimeout=5s

# Save full packet query and response in JSON format.
--psqlSaveFullQuery

4.9 - Metrics

Each enabled input and output comes with a set of metrics in order to monitor performance and troubleshoot your running instance. dnsmonster uses the go-metrics library which makes it easy to register metrics on the fly and in a modular way.

currently, three metric outputs are supported:

  • stderr
  • statsd
  • prometheus

Configuration parameters

[metric]
; Metric Endpoint Service. Choices: stderr, statsd, prometheus
MetricEndpointType = stderr

; Statsd endpoint. Example: 127.0.0.1:8125 
MetricStatsdAgent =

; Prometheus Registry endpoint. Example: http://0.0.0.0:2112/metric
MetricPrometheusEndpoint =

; Interval between sending results to Metric Endpoint
MetricFlushInterval = 10s

5 - Tutorials

Some Design Templates

All-In-One Test Environment

docker-compose

Above diagram shows the overview of the autobuild output. running ./autobuild.sh creates multiple containers:

  • a dnsmonster container per selected interfaces from the host to look at the raw traffic. Host’s interface list will be prompted when running autobuild.sh, allowing you to select one or more interfaces. *a clickhouse container to collect dnsmonster’s outputs and save all the logs and data to their respective directory inside the host. Both paths will be prompted in autobuild.sh. The default tables and TTL for them will implemented automatically.
  • a grafana container connecting back to clickhouse. It automatically sets up the connection to ClickHouse, and sets up the builtin dashboards based on the default ClickHouse tables. Note that Grafana container needs an internet connection to successfully set up the plugins. If you don’t have an internet connection, the dnsmonster and clickhouse containers will start working without any issues, and the error produced by Grafana can be ignored.

All-in-one Demo

AIO Demo

5.1 - ClickHouse Cloud

use dnsmonster with ClickHouse Cloud

ClickHouse Cloud is a Serverless ClickHouse offering by the ClickHouse team. In this small tutorial I’ll go through the steps of building your DNS monitoring with it. At the time of writing this post, ClickHouse Cloud is in preview and some of the features might change over time.

Create a ClickHouse Cluster

First, let’s create a ClickHouse instance by signing up and logging into ClickHouse Cloud portal and clicking on “New Service” on the top right corner. You will be asked to provide a name and a region for your database. For the purpose of this tutorial, I will put the name of the database as dnsmonster in us-east-2 region. There’s a good chance that other parameters will be present when you define your cluster such as size and number of servers, but overall everything should look pretty much the same

After clicking on create, you’ll see the connection settings for your instance. the default username to login is default and the password is generated randomly. Save that password for a later use since the portal won’t show it forever!

And that’s it! You have a fully managed ClickHouse cluster running in AWS. Now let’s create our tables and views using the credentials we just got.

Create and configure Tables

when you checkout dnsmonster repository from GitHub, there is a replicated table file with the table definitions suited for ClickHouse cloud. note that the “traditional” table design won’t work on ClickHouse cloud since the managed cluster won’t allow non-replicated tables. This policy has been put in place to ensure the high availability and integrity of the tables’ data. Download the .sql file and save it anywhere on your disk. for example, /tmp/tables_replicated.sql. Now let’s use clickhouse-client tool to create the tables.

clickhouse-client --host INSTANCEID.REGION.PROVIDER.clickhouse.cloud --secure --port 9440 --password RANDOM_PASSWORD --multiquery < /tmp/tables_replicated.sql

replace the all caps variables with your server instance and this should create your primary tables. Everything should be in place for us to use dnsmonster. Now we can point the dnsmonster service to the ClickHouse instance and it should work without any issues.

dnsmonster --devName lo \
          --packetHandlerCount 8 \
          --clickhouseAddress INSTANCEID.REGION.PROVIDER.clickhouse.cloud:9440 \
          --clickhouseOutputType 1 \
          --clickhouseBatchSize 7000 \
          --clickhouseWorkers 16 \
          --clickhouseSecure \
          --clickhouseUsername default \
          --clickhousePassword "RANDOM_PASSWORD" \
          --clickhouseCompress \
          --serverName my_dnsmonster \
          --maskSize4 16 \
          --maskSize6 64

Compressing the ClickHouse INSERT connection (--clickhouseCompress) will make it efficient and fast. I’ve gotten better result by turning it on. Keep in mind that the tweaking of the packetHandlerCount as well as number of ClickHouse workers, batch size etc. will have a major impact on the overall performance. In my test, I’ve been able to exceed ~250,000 packets per seconds easily on my fibre connection. Keep in mind that you can substitute command line arguments with environment variables or a config file. Refer to the Configuration section of the documents for more info.

Configuring Grafana and dashboards

Now that the data is being pushed into ClickHouse, you can leverage Grafana with the pre-built dashboard to help you gain visibility over your data. Let’s start with running an instance of Grafana in a docker container.

docker run --name dnsmonster_grafana -p 3000:3000 grafana/grafana:8.4.3

then browse to localhost:3000 with admin as both username and password, and install the ClickHouse plugin for Grafana. There are two choices in Grafana store, so both of them should work file out of the box, I’ve tested Altinity plugin for ClickHouse but there’s also an official ClickHouse Grafana Plugin to choose from.

After installing the plugin, you can add your ClickHouse server as a datasource using the same address, port and the password you used to run dnsmonster. After connecting Grafana to ClickHouse, you can import the pre-built dashboard from here either via the GUI or the CLI. Once your dashboard is imported, you can point it to your datasource address and most panels should start showing data. most, but not all.

One final step to make sure everything is running smoothly, is to INSERT the dictionaries. Download the 4 dictonary files located here either manually or by cloning the git repo. I’ll assume that they’re in your /tmp/ directory. Now let’s go back to clickhouse-client and quickly make that happen

clickhouse-client --host INSTANCEID.REGION.PROVIDER.clickhouse.cloud --secure --port 9440 --password RANDOM_PASSWORD 
CREATE DICTIONARY dns_class (Id Uint64, Name String) PRIMARY KEY Id LAYOUT(FLAT()) SOURCE(HTTP(url "https://raw.githubusercontent.com/mosajjal/dnsmonster/main/clickhouse/dictionaries/dns_class.tsv" format TSV)) LIFETIME(MIN 0 MAX 0)
CREATE DICTIONARY dns_opcode (Id Uint64, Name String) PRIMARY KEY Id LAYOUT(FLAT()) SOURCE(HTTP(url "https://raw.githubusercontent.com/mosajjal/dnsmonster/main/clickhouse/dictionaries/dns_opcode.tsv" format TSV))  LIFETIME(MIN 0 MAX 0) 
CREATE DICTIONARY dns_response (Id Uint64, Name String) PRIMARY KEY Id LAYOUT(FLAT()) SOURCE(HTTP(url "https://raw.githubusercontent.com/mosajjal/dnsmonster/main/clickhouse/dictionaries/dns_response.tsv" format TSV))  LIFETIME(MIN 0 MAX 0) 
CREATE DICTIONARY dns_type (Id Uint64, Name String) PRIMARY KEY Id LAYOUT(FLAT()) SOURCE(HTTP(url "https://raw.githubusercontent.com/mosajjal/dnsmonster/main/clickhouse/dictionaries/dns_type.tsv" format TSV)) LIFETIME(MIN 0 MAX 0) 

And that’s about it. With above commands, the full stack of Grafana, ClickHouse and dnsmonster should work perfectly. No more managing ClickHouse clusters manually! You can also combine this with the Kubernetes tutorial and provide a cloud-native, serverless DNS monitoring platform at scale.

5.2 - Kubernetes

use dnsmonster to monitor Kubernetes DNS traffic

In this guide, I’ll go through the steps to inject a custom configuration into Kubernetes’ coredns DNS server to provide a dnstap logger, and set up a dnsmonster pod to receive the logs, process them and send them to intended outputs.

dnsmonster deployment

In order to make dnsmonster see the dnstap connection coming from coredns pod, we need to create the dnsmonster Service inside the same namespace (kube-system or equivalent)

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: dnsmonster-dnstap
  name: dnsmonster-dnstap
  namespace: kube-system
spec:
  # change the replica count to how many you might need to comfortably ingest the data
  replicas: 1
  selector:
    matchLabels:
      k8s-app: dnsmonster-dnstap
  template:
    metadata:
      labels:
        k8s-app: dnsmonster-dnstap
    spec:
      containers:
      - name: dnsm-dnstap
        image: ghcr.io/mosajjal/dnsmonster:v0.9.3
        args: 
          - "--dnstapSocket=tcp://0.0.0.0:7878"
          - "--stdoutOutputType=1"
        imagePullPolicy: IfNotPresent
        ports:
          - containerPort: 7878
---
apiVersion: v1
# https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/#creating-a-service
# as per above documentation, each service will have a unique IP address that won't change for the lifespan of the service
kind: Service
metadata:
  name: dnsmonster-dnstap
  namespace: kube-system
spec:
  selector:
    k8s-app: dnsmonster-dnstap
  ports:
  - name: dnsmonster-dnstap
    protocol: TCP
    port: 7878
    targetPort: 7878
EOF

now we can get the static IP assigned to the service to use it in coredns custom ConfigMap. Note that since CoreDNS itself is providing DNS, it does not support FQDN as a dnstap endpoint.

SVCIP=$(kubectl get service dnsmonster-dnstap --output go-template --template='{{.spec.clusterIP}}')

locate and edit the coredns config

Let’s try and see if we can see and manipulate configuration inside coredns pods. Using below command, we can get a list of running coredns containers

kubectl get pod --output yaml --all-namespaces | grep coredns

In above command, you should be able to see many objects associated with coredns, most notably, coredns-custom. coredns-custom ConfigMap allows us to customize coredns configuration file and enable builtin plugins for it. Many cloud providers have built coredns-custom ConfigMap into the offering. Take a look at AKS, Oracle Cloud and DigitalOcean docs for more details.

in Amazon’s EKS, there’s no coredns-custom. So the configuration needs to be edited on the main configuration file instead. On top of that, EKS will keep overriding your configuration with the “default” value through a DNS add-on. That add-on needs to be disabled for any customization in coredns configuration as well. Take a look at This issue for more information.

Below command has been tested on DigitalOcean managed Kubernetes

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom
  namespace: kube-system
data:
  log.override: |
    dnstap tcp://$SVCIP:7878 full
EOF

After running the above command, you will see the logs inside your dnsmonster pod. As commented in the yaml definitions, customizing the configuration parameters should be fairly straightforward.

6 - FAQ

Why should I use dnsmonster

I’ve broken this question into two. Why do you need to monitor your DNS, and why is dnsmonster a good choice to do so.

Do I need passive DNS capture capability

DNS is one of, if not the most prevalent indicators of compromise in all attacks. The vast majority of external communication of a malware or a backdoor (~92% according to Cisco) have some sort of DNS connectivity in their chain. Here are some great articles around the importance of DNS Security monitoring

Why dnsmonster specifically?

dnsmonster is one of the few products supporting a wide range of inputs (pcap file, dnstap, live interface on Windows and *Nix, afpacket) and a variety of outputs with minimum configuration and maximum performance. It can natively send data to your favorite Database service or a Kafka topic, and has a builtin capability of sending its metrics to a metrics endpoint. Check out the full feature set of monster in the “Getting Started” section.

In addition, dnsmonster also offers a fantastic performance by utilizing all CPU cores available on the machine, and has builtin buffers to cope with sudden traffic spikes.

Why did you name it dnsmonster

When I first tested dnsmonster on a giant DNS pcap file (220+ Million DNS Queries and responses) and saw it outperform other products in the same category, I described it to one of my mates that it “devoured those packets like the cookie monster” and that’s how the monster within dnsmonster was born

What OS should I use to run DNSmonster

dnsmonster will always offer first-class support for the modern Linux kernel (4.x) so it is recommended that you use dnsmonster on a modern Linux distribution. It can also be compiled for Windows, *BSD and Mac OS, but many of the performance tweaks will not work as well as they do for Linux.

For example, when dnsmonster is build on non-Unix systems, it stops manipulating the JSON objects with sonic.

As for architecture, dnsmonster builds successfully against arm7, aarch64 and amd64, but the performance benchmark has not been done to determine which architecture works best with it.

Why is dnsmonster is not working for me

There could be several reasons behind why dnsmonster is not working. The best way to start troubleshooting dnsmonster is to have a Go compiler handy so you can build dnsmonster from the source and try the following:

  • Try building the master branch and run it with stdoutOutput to see if there is any output
  • Try running dnsmonster with or without afpacket support and various buffer sizes
  • Use a different packet capture method than dnsmonster to see if the packets are visible (tcpdump and netsniff-ng are good examples)
  • Try piping the packets from tcpdump to dnsmonster with stdoutOutput and see if that makes any difference. like so:
    • sudo tcpdump -nni eth0 -w - | dnsmonster --pcapFile=- --stdoutOutputType=1
  • Pay attention to the port variable if your DNS packets are being sent on a different port than 53. That parameter is different than BPF. Speaking of which, make sure your BPF is not too restrictive.

If none of the above works, feel free to open an issue with the details of your problem. If you are planning to attach a pcap file as part of your issue, make sure to anonymize it

How do I upgrade between the version

Before the product hits 1.x.x, breaking changes between each release is expected. Read the release note between your current version and desired version one by one to see if you need to upgrade in increments or not.

After 1.x.x, the plan is to maintain backwards compatibility in major versions (eg every 1.x.x installation will work as part of an upgrade). However, that will not necessarily be the case for ClickHouse tables. Since ClickHouse is a fast moving product, there might be a need to change the schema of the tables regardless of dnsmonster’s major release.

The JSON output fields, which is the basis for the majority of dnsmonster outputs, is bound to Miekg’s dns library. The library seems to be fairly stable and have used the same data structure for years. For dnsmonster, the plan is to maintain the JSON schema the same for each major release so SIEM parsers such as ASIM and CIM can maintain functionality. dnsmonster also supports go-template output similar to kubectl and makes it easy to customize and standardize your output to cater for your needs.

How fast is dnsmonster

dnsmonster have demonstrated 200,000 packets per second ingestion on a beefy server with ClickHouse being run on the same machine with SSD storage backend. Since then, the performance of dnsmonster for both packet ingestion and output pipeline have been improved, to the point that you can ingest the same number of packets per second on a commodity laptop. I would say for the majority of use cases, dnsmonster will not be the bottleneck of your data collection.

If you have a heavy workload that you have tested with dnsmonster, I would be happy to receive your feedback and share the numbers with the community

Which output do I use

Depends. I would recommend sticking with the current toolset you have. Majority of organizations have built a syslog or kafka pipeline to get the data into the ingestion point, and both are fully supported by dnsmonster. If you want to test the product and its output, you can use file and stdout quite easily. Keep in mind that for file, you should consider your disk IO if you’re writing a ton of data into disk.

If you’re keen to build a new solution from scratch, I would recommend looking at ClickHouse. dnsmonsterwas originally built with ClickHouse in mind, and ClickHouse remains one of the better tools to ingest DNS logs. Take a look at how CloudFlare is leveraging ClickHouse to monitor 1.1.1.1 here

Why am I dropping packets

There could be many reasons behind packet loss. I went through some of them with possible solutions in the performance section.

Is there a Slack or Discord I can join

Not yet. At the moment, the repo’s discussions is created for this purpose. If that proves to be less than ideal, I’m open to have a Discord/Slack/Telegram channel dedicated to dnsmonster. Let me know!

How to contribute

I have broken contribution into different sections

  • For security and bug disclosure, please visit SECURITY.md in the main repository to get more info on how to report vulnerabilities responsibly
  • For bugfixes, please open an issue first, before submitting a PR. That way the other contributors know what is being worked on and there will be less duplicate work on bugfixes. Also, sometimes the bugfixes are more related to a particular client and there could be other mitigation other than changing the code
  • For new features and Output modules, please raise an issue first. That way the work can be assigned and timeline-d for the next major release and we can get together in the discussions to set the requirements

There are also many //todo comments in the code, feel free to take a stab at those.

Last but not least, this very documentation needs your help! On the right hand side of each page, you can see some helper links to raise an issue with a page, or propose an edit or even create a new page.