This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Documentation
welcome to dnsmonster
documentation!
The documentation is broken down into different sections, Getting Started focuses on installation and post installation work like compiling dnsmonster
from source, setting up services, shell complations and more. Configuration gets into details of how to configure dnsmonster
, and how to identify and solve potential performance bottlenecks. The majority of your configuration is done inside the Input and Output sections.
You’ll learn where can you put filter on incoming traffic, sample inputs, mask IP addresses before even passing the packets on processor. After process, you’ll be able to exclude certain FQDNs from being sent to output, or include certain domains to be logged.
All above will generate a ton of useful metrics for your DNS infrastructure. dnsmonster
has a builtin metrics system that can integrate to your favourite metrics aggregator like prometheus
or statsd
.
1 - Getting Started
Getting Started with dnsmonster
Passive DNS monitoring framework built on Golang.
dnsmonster
implements a packet sniffer for DNS traffic. It Ability to accept traffic from a pcap
file, a live interface or a dnstap
socket,
and Ability to be used to index and store hundreds of thousands of DNS queries per second as it has shown to be capable of indexing 200k+ DNS queries per second on a commodity computer. It aims to be scalable, simple and easy to use, and help
security teams to understand the details about an enterprise’s DNS traffic. dnsmonster
doesn’t look to follow DNS conversations, rather it aims to index DNS packets as soon as they come in. It also doesn’t aim to breach
the privacy of the end-users, with the ability to mask Layer 3 IPs (IPv4 and IPv6), enabling teams to perform trend analysis on aggregated data without being able to trace back the queries to an individual. Blogpost
Warning
The code before version 1.x is considered beta quality and is subject to breaking changes. Please check the release notes for each tag to see the list of breaking scenarios between each release, and how to mitigate potential data loss.
Main features
- Ability to use Linux’s
afpacket
and zero-copy packet capture.
- Supports BPF
- Ability to mask IP address to enhance privacy
- Ability to have a pre-processing sampling ratio
- Ability to have a list of “skip”
fqdn
s to avoid writing some domains/suffix/prefix to storage
- Ability to have a list of “allow” domains, used to log access to certain domains
- Hot-reload of skip and allow domain files/urls
- Modular output with configurable logic per output stream.
- Automatic data retention policy using ClickHouse’s TTL attribute
- Built-in Grafana dashboard for ClickHouse output.
- Ability to be shipped as a single, statically linked binary
- Ability to be configured using environment variables, command line options or configuration file
- Ability to sample outputs using ClickHouse’s SAMPLE capability
- Ability to send metrics using
prometheus
and statstd
- High compression ratio thanks to ClickHouse’s built-in LZ4 storage
- Supports DNS Over TCP, Fragmented DNS (udp/tcp) and IPv6
- Supports dnstrap over Unix socket or TCP
- built-in SIEM integration with Splunk and Microsoft Sentinel
1.1 - installation
Learn how to install dnsmonster on your platform using Docker, prebuilt binaries, or compiling it from the source on any platform Go supports
dnsmonster
has been built with minimum dependencies. In runtime, the only optional dependency for dnsmonster
is libpcap
. By building dnsmonster
without libpcap, you will lose the ability to set bpf
filters on your live packet captures.
installation methods
Prebuilt binaries
Each relase of dnsmonster
will ship with two binaries. One for Linux amd64, built statically against an Alpine based image, and one for Windows amd64, which depends on a capture library to be installed on the OS. I’ve tested thw Windows binary with the latest version of Wireshark installed on the system and there was no issues to run the executable.
Prebuilt packages
Per each release, the statically-linked binary mentioned above is also wrapped into deb
and rpm
packages with no dependencies, making it easy to deploy it in Debian and RHEL based distributions. Note that the packages don’t generate any service files or configuration templates at installation time.
Run as a container
The container build process only generates a Linux amd64 output. Since dnsmonster
uses raw packet capture funcationality, Docker/Podman daemon must grant the capability to the container
sudo docker run --rm -it --net=host --cap-add NET_RAW --cap-add NET_ADMIN --name dnsmonster ghcr.io/mosajjal/dnsmonster:latest --devName lo --stdoutOutputType=1
Check out the configuration section to understand the provided command line arguments.
Build from the source
- with
libpcap
:
Make sure you have go
, libpcap-devel
and linux-headers
packages installed. The name of the packages might differ based on your distribution. After this, simply clone the repository and run go build .
git clone https://github.com/mosajjal/dnsmonster --depth 1 /tmp/dnsmonster
cd /tmp/dnsmonster
go get
go build -o dnsmonster ./cmd/dnsmonster
- without
libpcap
:
dnsmonster
only uses one function from libpcap
, and that’s converting the tcpdump
-style filters into BPF bytecode. If you can live with no BPF support, you can build dnsmonster
without libpcap
. Note that for any other platform, the packet capture falls back to libpcap
so it becomes a hard dependency (*BSD, Windows, Darwin)
git clone https://github.com/mosajjal/dnsmonster --depth 1 /tmp/dnsmonster
cd /tmp/dnsmonster
go get
go build -o dnsmonster -tags nolibpcap ./cmd/dnsmonster
The above build also works on ARMv7 (RPi4) and AArch64.
Build Statically
If you have a copy of libpcap.a
, you can build the statically link it to dnsmonster
and build it fully statically. In the code below, please change /root/libpcap-1.9.1/libpcap.a
to the location of your copy.
git clone https://github.com/mosajjal/dnsmonster --depth 1 /tmp/dnsmonster
cd /tmp/dnsmonster/
go get
go build --ldflags "-L /root/libpcap-1.9.1/libpcap.a -linkmode external -extldflags \"-I/usr/include/libnl3 -lnl-genl-3 -lnl-3 -static\"" -a -o dnsmonster ./cmd/dnsmonster
For more information on how the statically linked binary is created, take a look at Dockerfiles in the root of the repository responsible for generating the published binaries.
1.2 - post-installation
Set up services and shell completions for dnsmonster
Post-install
After you install dnsmonster, you might need to take a few extra steps to build services so dnsmonster
runs automatically on system startup. These steps aren’t included in the installation process by default.
Systemd service
If you’re using a modern and popular distro like Debian, Ubuntu, Fedora, Arch, RHEL, you’re probably using systemd
as your init system. To have dnsmonster
as a service, created a file in /etc/systemd/system/
named dnsmonster.service
, and define your systemd unit there. The name dnsmonster
as a service name is totally optional.
cat > /etc/systemd/system/dnsmonster.service << EOF
[Unit]
Description=Dnsmonster Service
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
Restart=always
RestartSec=3
ExecStart=/sbin/dnsmonster --config /etc/dnsmonster.ini
[Install]
WantedBy=multi-user.target
EOF
The above systemd service looks at /etc/dnsmonster.ini
as a configuration file. Checkout the configuration section to see how that configuration file is generated.
to start the service and ebable it at boot time, run the following
sudo systemctl enable --now dnsmonster.service
You can also build a systemd service that takes the interface name dynamically and runs the dnsmonster
instance per interface. To do so, create a service unit like this:
cat > /etc/systemd/system/[email protected] << EOF
[Unit]
Description=Dnsmonster Service
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
Restart=always
RestartSec=3
ExecStart=/sbin/dnsmonste --devName=%i --config /etc/dnsmonster.ini
[Install]
WantedBy=multi-user.target
EOF
The above unit creates a dynamic systemd service that can be enabled for multiple Interfaces. For example, to run the service for the loopback interface in linux (lo
), run the following:
Note that the above example only works if you’re not specifying a dnstap
or a local pcap
file as an input inside the configuration file.
init.d service
bash and fish completion
2 - Configuration
Learn about the command line arguments and what they mean
to run dnsmonster
, one input and at least one output must be defined. The input could be any of devName
for live packet capture, pcapFile
to read off a pcap file, or dnstapSocket
address to listen to. Currently, running dnsmonster
with more than one input stream at a time isn’t supported. For output however, it’s supported to have more than one channel. Sometimes, it’s also possible to have multiple instances of the same output (for example Splunk) to provide load balancing and high availability.
Note that in case of specifying multiple output streams, the output data is copied to all. For example, if you put stdoutOutputType=1
and --fileOutputType=1 --fileOutputPath=/dev/stdout
, you’ll see each processed output twice in your stdout. One coming from the stdout output type, and the other from the file output type which happens to have the same address (/dev/stdout
).
dnsmonster can be configured in 3 different ways. Command line options, Environment variables and a configuration file. You can also use any combination of them at the same time. The precedence order is as follows:
- Command line options (Case-insensitive)
- Environment variables (Always upper-case)
- Configuration file (Case-sensitive, lowercase)
- Default values (No configuration)
For example, if you have a configuration file that has specified a devName
, but you also provide it as a command-line argument, dnsmonster will prioritizes CLI over config file and will ignore that parameter from the ini
file.
Command line options
To see the current list of command-line options, run dnsmonster --help
or checkout the repository’s README.md.
Environment variables
all the flags can also be set via env variables. Keep in mind that the name of each parameter is always all upper case and the prefix for all the variables is “DNSMONSTER.” Example:
$ export DNSMONSTER_PORT=53
$ export DNSMONSTER_DEVNAME=lo
$ sudo -E dnsmonster
Configuration file
you can run dnsmonster
using the following command to use configuration file:
$ sudo dnsmonster --config=dnsmonster.ini
# Or you can use environment variables to set the configuration file path
$ export DNSMONSTER_CONFIG=dnsmonster.ini
$ sudo -E dnsmonster
2.1 - Performance
Performance considerations when configuring dnsmonster
Use afpacket
If you’re using dnsmonster as a sniffer, and you’re not keeping up with the number of packets that are coming in, consider switching on afpacket by using the flag --useAfpacket
. Afpacket tends to drastically improve packet ingestion rate of dnsmonster. If you still having packet drop issues, increase --afpacketBuffersizeMb
to a higher value. the buffer size will take up more memory on startup, and will increase the startup time depending how much have you assigned to it.
In some tests, values above 4096MB tend to have negative impact on the overall performance of the daemon. If you’re using 4096MB of buffer size and still seeing performance issues, There’s a good chance the issue isn’t on the capture size, and more on the process and output side.
Proper Output and packet handlers
Simply put, if you have an output that can accept 1000 inserts per second, but you have an incoming packet rate of 10,000 packets per second, you’re going to see a lot of packet drops. The packet drop will get worse and worse as time goes by as well. When selecting an output, consider the capacity of your technology and what you expect to be ingested.
If you are seeing a considerable amount of packet loss which gets worse as time goes on, consider testing --stdoutOutputType=1
and remove your current output, and redirect the output to /dev/null
. You can also tweak the number of workers converting the data to JSON to further experiment with it. Take the following example
dnsmonster --devName=lo --packetHandlerCount=16 --stdoutOutputType=1 --useAfpacket | pv --rate --line-mode > /dev/null
In above command, you can see the exact output line per second while maintaining a view on metrics and packet loss to see if your packet loss is still present. by default, --stdoutOutputWorkerCount
is set to 8. If you have a strong enough CPU, you can increase that amount to see what’s the max rate you can achieve. On a small server, you shouldn’t have a problem ingesting 500k packet per second.
Note that the --packetHandlerCount
is also set to 16 to make sure enough workers are ingesting packets coming in. That’s also an important parameter to tweak to achieve the optimum performance. The default, 2
, might be too low for you if you have hundreds of thousands of packets per second on an interface.
Sampling and BPF-based split of traffic
Sometimes, the packets are simply too much to process. dnsmonster
offers a few options to deal with this problem. --sampleRatio
simply ignores packets by the defined ratio. default is 1:1
, meaning for each incoming packet, one gets processed, aka 100%. you can tweak this number if your hardware isn’t capable of processing all the packets, or dnsmonster
has simply reached its limit.
For example, putting 2:7
as your sample ratio means for each 7 packets that come in, only the first two get processed.
If after testing all options you’ve reached the conclusion that dnsmonster
can not handle more than what you need it to do, please raise an issue about it, but also you can run multiple instances of dnsmonster
looking at the same traffic like so:
dnsmonster --devName=lo --stdoutOutputType=1 --filter="src portrange 1024-32000"
dnsmonster --devName=lo --stdoutOutputType=1 --filter="src portrange 32001-65535"
The above processes will split the traffic between them based on the port range. Note that only high ports are included since majority of the clients use ports above 1024 to conduct a DNS query. you can change the filter based on any BPF that makes sense for your environment.
Profile CPU and Memory
To take a look at what exactly is using your CPU and RAM, take a look at the Golang profiler tools available through --memprofile
and --cpuprofile
flags. to use them, issue the following
# profile CPU
dnsmonster --devName=lo --stdoutOutputType=1 --cpuprofile=1
# you'll see something like this in the beginning of your logs
# 2022/04/11 19:13:51 profile: cpu profiling enabled, /tmp/profile452510705/cpu.pprof
# profile RAM
dnsmonster --devName=lo --stdoutOutputType=1 --memprofile=1
# you'll see something like this in the beginning of your logs
# 2022/04/11 19:15:00 profile: memory profiling enabled (rate 4096), /tmp/profile1290716652/mem.pprof
After dnsmonster
exits gracefully, you can use Go’s perf tools to open the generated pprof
file in a browser and dig deep into functions that are being bottleneck in the code. After installing pprof
, use it like below
~/go/bin/pprof -http 127.0.0.1:8882 /tmp/profile2392236212/mem.pprof
A browser session will automatically open with the performance metrics for your execution.
3 - Inputs and Filters
Set up an input to receive data
To get the raw data into dnsmonster
pipeline, you must specify an input stream. Currently there are three supported Input methods:
- Live interface
- Pcap file
dnstap
socket
The configuration for inputs and packet processing is contained within the capture
section of the configuration:
-
--devName
: Enables live capture mode on the device. Only one interface per dnsmonster
instance is supported.
-
--pcapFile
: Enables offline pcap mode. You can specify “-” as pcap file to read from stdin
-
--dnstapSocket
: Enables dnstap mode. Accepts a socket path. Example: unix:///tmp/dnstap.sock, tcp://127.0.0.1:8080.
-
--port
: Port selected to filter packets (default: 53). Works independently from BPF filter
-
--sampleRatio
: Specifies packet sampling ratio at capture time. default is 1:1 meaning all packets passing the bpf will get processed.
-
--dedupCleanupInterval
: In case –dedup is enabled, cleans up packet hash table used for it (default: 60s)
-
--dnstapPermission
: Set the dnstap socket permission, only applicable when unix:// is used (default: 755)
-
--packetHandlerCount
: Number of workers used to handle received packets (default: 2)
-
--tcpAssemblyChannelSize
: Specifies the goroutine channel size for the TCP assembler. TCP assembler is used to de-fragment incoming fragmented TCP packets in a way that won’t slow down the process of “normal” UDP packets.
-
--tcpResultChannelSize
: Size of the tcp result channel (default: 10000)
-
--tcpHandlerCount
: Number of routines used to handle TCP DNS packets (default: 1)
-
--defraggerChannelSize
: Size of the channel to send raw packets to be de-fragmented (default: 10000)
-
--defraggerChannelReturnSize
: Size of the channel where the de-fragmented packets are sent to the output queue (default: 10000)
-
--packetChannelSize
: Size of the packet handler channel (default: 1000)
-
--afpacketBuffersizeMb
: Afpacket buffer size in MB (default: 64)
-
--filter
: BPF filter applied to the packet stream.
-
--useAfpacket
: Use this boolean flag to switch on afpacket
sniff method on live interfaces
-
--noEtherframe
: Use this boolean flag if the incoming packets (pcap file) do not contain the Ethernet frame
-
--dedup
: Boolean flag to enable experimental de-duplication engine
-
--noPromiscuous
: Boolean flag to prevent dnsmonster
to automatically put the devName
in promiscuous mode
Above flags are used in variety of ways. Check the Filters and Masks and inputs for more detailed info.
3.1 - Filters and masks
There are a few ways to manipulate incoming packets in various steps of dnsmonster
pipeline. They operate in different levels of stack and have different performance implications.
BPF
Applied at kernel level
BPF is by far the most performant way to filter incoming packets. It’s only supported on live capture (--devName
). It uses the tcpdump
’s pcap-filter language to filter out the packets.
Sample Ratio
Applied at capture level
Sample ratio (--sampleRatio
) is an easy way to reduce the number of packets being pushed to the pipeline purely by numbers. the default value is 1:1 meaning for each 1 incoming packet, 1 gets pushed to the pipeline. you can change that if you have a huge number of packets or your output is not catching up with the input. Checkout performance guide for more detail.
De-duplication
Applied at capture level
The experimental de-duplication (--dedup
) feature is implemented to provide a rudimentary packet de-duplication capability. The functionality of de-duplication is very simple. It uses a non-cryptography hashing function (FNV-1) on the raw packets and generates a hash table of incoming packets as the come in. Note that the hashing function happens before stripping 802.1q
, vxlan
, ethernet
layers so the de-duplication happens purely on the packet bytes.
There’s also the option --dedupCleanupInterval
to specify cleanup time for the hash table. around the time of cleanup, there could be a few duplicate packets since the hash table is not time-bound on its own. It gets flushed completely at the interval.
Applied after Sample Ratio for each packet.
Port
Applied at early process level
There’s an additional filter specifying the port (--port
) of each packet. since the vast majority of the DNS packets are served out of port 53, this parameter shouldn’t have any effect by default. note that this filter will not be applied to fragmented packets.
IP Masks
Applied at process level
While processing the packets, the source and destination IPv4 and IPv6 packets can be masked by a specified number of bytes (--maskSize4
and --maskSize6
options). Since this step happens after de-duplication, there could be seemingly duplicate entries in the output purely because of the fact that IP prefixes appear the same.
Allow and Skip Domain list
Applied at output level
These two filters specify an allowlist and a skip list for the domain outputs. --skipDomainsFile
is used to avoid writing noisy, repetitive data to your Output. The skip domain list is a csv-formatted file (or a URL containing the file), with only two columns: a string representing part or all of a FQDN, and a logic for that particular string. dnsmonster
supports three logics for each entry: prefix
, suffix
and fqdn
. prefix
and suffix
means that only the domains starting/ending with the mentioned string will be skipped from being sent to output. Note that since the process is being done on DNS questions, your string will most likely have a trailing .
that needs to be included in your skip list row as well (take a look at skipdomains.csv.sample for a better view). You can also have a full FQDN match to avoid writing highly noisy FQDNs into your database.
--allowDomainsFile
provides the exact opposite of skip domain logic, meaning your output will be limited to the entries inside this list.
both --skipDomainsFile
and --allowDomainsFile
have an automatic refresh interval and re-fetch the FQDNs using --skipDomainsRefreshInterval
and --allowDomainsRefreshInterval
options.
For each output type, you can specify which of these tables are used. Check the output section for more detail regarding the output modes.
3.2 - Input options
Let’s go through some examples of how to set up dnsmonster
inputs
Live interface
To start listening on an interface, simply put the name of the interface in the --devName=
parameter. In unix-like systems, the ip a
command or ifconfig
gives you a list of interfaces that you can use. In this mode, dnsmonster
needs to run with higher privileges.
In Windows environments, to get a list of interfaces, open cmd.exe as Administrator and run the following: getmac.exe
. You’ll see a table with your interfaces’ MAC address and a Transport Name column with something like this: \Device\Tcpip_{16000000-0000-0000-0000-145C4638064C}
.
Then you simply replace the word Tcpip_
with NPF_
and use it as the --devName
parameter. Like so
dnsmonster.exe --devName \Device\NPF_{16000000-0000-0000-0000-145C4638064C}
Pcap file
To analyze a pcap file, you can simply use the --pcapFile=
option. You can also use the value -
or /dev/stdin
to read the pcap
from stdin. This can be used in pcap-over-ip and zipped pcaps that you would like to analyze on the fly. For example, this example will read the packets as they’re getting extracted without saving the extracted pcap on the disk
lz4cat /path/to/a/hug/dns/capture.pcap.lz4 | dnsmonster --pcapFile=- --stdoutOutputType=1
Pcap-over-Ip
dnsmonster
doesn’t support pcap-over-ip directly, but you can achieve the same results by combining a program like netcat
or socat
with dnsmonster
to make pcap-over-ip work.
to connect to a remote pcap-over-ip server, use the following
while true; do
nc -w 10 REMOTE_IP REMOTE_PORT | dnsmonster --pcapFile=- --stdoutOutputType=1
done
to listen on pcap-over-ip, the following code can be used
while true; do
nc -l -p REMOTE_PORT | dnsmonster --pcapFile=- --stdoutOutputType=1
done
if pcap-over-ip is a popular enough option, the process of building a native capability to support it shouldn’t be too difficult. Feel free to open a topic in the discussion page or simply an issue on the repo if this is something you care about.
dnstap
dnsmonster
can listen on a dnstap
TCP or Unix socket and process the dnstap
logs as they come in just like a network packet, since dnstap
’s specification is very close to the packet itself. to learn more about dnstap
, visit their website here.
to use dnstap as a TCP listener, use --dnstapSocket
with a syntax like --dnstapSocket=tcp://0.0.0.0:5555
. If you’re using a Unix socket to listen for dnstap packets, you can use unix:///tmp/dnstap.sock
and set the socket file permission with --dnstapPermission
option.
Currently, the dnstap
in client mode is unsupported since the use case of it is very rare. in case you need this function, you can use a tcp port proxy or socat
to convert the TCP connection into a unix socket and read it from dnsmonster
.
4 - Outputs
Set up output(s) and gather metrics
dnsmonster
follows a pipeline architecture for each individual packet. After the Capture and filter, each processed packet arrives at the output dispatcher. The dispatcher sends a copy of the output to each individual output module that have been configured to produce output. For instance, if you specify stdoutOutputType=1
and --fileOutputType=1 --fileOutputPath=/dev/stdout
, you’ll see each processed output twice in your stdout. One coming from the stdout output type, and the other from the file output type which happens to have the same address (/dev/stdout
).
In general, each output has its own configuration section. You can see the sections with “_output” suffix when running dnsmonster --help
from the command line. The most important parameter for each output is their “Type”. Each output has 5 different types:
- Type 0:
- Type 1: An output module configured as Type 1 will ignore “SkipDomains” and “AllowDomains” and will generate output for all the incoming processed packets. Note that the output types does not nullify input filters since it is applied after capture and early packet filters. Take a look at Filters and Masks to see the order of the filters applied.
- Type 2: An output module configured as Type 2 will ignore “AllowDomains” and only applies the “SkipDmains” logic to the incoming processed packets.
- Type 3: An output module configured as Type 3 will ignore “SkipDmains” and only applies the “AllowDomains” logic to the incoming processed packets.
- Type 4: An output module configured as Type 4 will apply both “SkipDmains” and “AllowDomains” logic to the incoming processed packets.
Other than Type
, each output module may require additional configuration parameters. For more information, refer to each module’s documentation.
dnsmonster
supports multiple output formats:
json
: the standard JSON output. The output looks like below sample
{"Timestamp":"2020-08-08T00:19:42.567768Z","DNS":{"Id":54443,"Response":true,"Opcode":0,"Authoritative":false,"Truncated":false,"RecursionDesired":true,"RecursionAvailable":true,"Zero":false,"AuthenticatedData":false,"CheckingDisabled":false,"Rcode":0,"Question":[{"Name":"imap.gmail.com.","Qtype":1,"Qclass":1}],"Answer":[{"Hdr":{"Name":"imap.gmail.com.","Rrtype":1,"Class":1,"Ttl":242,"Rdlength":4},"A":"172.217.194.108"},{"Hdr":{"Name":"imap.gmail.com.","Rrtype":1,"Class":1,"Ttl":242,"Rdlength":4},"A":"172.217.194.109"}],"Ns":null,"Extra":null},"IPVersion":4,"SrcIP":"1.1.1.1","DstIP":"2.2.2.2","Protocol":"udp","PacketLength":64}
csv
: the CSV output. The fields and headers are non-customizable at the moment. to get a custom output, please look at gotemplate
.
Year,Month,Day,Hour,Minute,Second,Ns,Server,IpVersion,SrcIP,DstIP,Protocol,Qr,OpCode,Class,Type,ResponseCode,Question,Size,Edns0Present,DoBit,Id
2020,8,8,0,19,42,567768000,default,4,2050551041,2050598324,17,1,0,1,1,0,imap.gmail.com.,64,0,0,54443
csv_no_headers
: Looks exactly like the CSV but with no header print at the beginning
gotemplate
: Customizable template to come up with your own formatting. let’s look at a few examples with the same packet we’ve looked at using JSON and CSV
$ dnsmonster --pcapFile input.pcap --stdoutOutputType=1 --stdoutOutputFormat=gotemplate --stdoutOutputGoTemplate="timestamp=\"{{.Timestamp}}\" id={{.DNS.Id}} question={{(index .DNS.Question 0).Name}}"
timestamp="2020-08-08 00:19:42.567735 +0000 UTC" id=54443 question=imap.gmail.com.
Take a look at the official docs for more info regarding text/template and your various options.
4.1 - Apache Kafka
Possibly the most versatile output supported by dnsmonster
. Kafka output allows you to connect to endless list of supported sinks. It is the recommended output module for enterprise designs since it offers fault tolerance and it can sustain outages to the sink. dnsmonster
’s Kafka output supports compression, TLS, and multiple brokers. In order to provide multiple brokers, you need to specify it multiple times.
Configuration Parameters
[kafka_output]
; What should be written to kafka. options:
; 0: Disable Output
; 1: Enable Output without any filters
; 2: Enable Output and apply skipdomains logic
; 3: Enable Output and apply allowdomains logic
; 4: Enable Output and apply both skip and allow domains logic
KafkaOutputType = 0
; kafka broker address(es), example: 127.0.0.1:9092. Used if kafkaOutputType is not none
KafkaOutputBroker =
; Kafka topic for logging
KafkaOutputTopic = dnsmonster
; Minimum capacity of the cache array used to send data to Kafka
KafkaBatchSize = 1000
; Kafka connection timeout in seconds
KafkaTimeout = 3
; Interval between sending results to Kafka if Batch size is not filled
KafkaBatchDelay = 1s
; Compress Kafka connection
KafkaCompress = false
; Compression Type[gzip, snappy, lz4, zstd] default is snappy
KafkaCompressiontype = snappy
; Use TLS for kafka connection
KafkaSecure = false
; Path of CA certificate that signs Kafka broker certificate
KafkaCACertificatePath =
; Path of TLS certificate to present to broker
KafkaTLSCertificatePath =
; Path of TLS certificate key
KafkaTLSKeyPath =
4.2 - Parquet
Parquet output module is designed to send dnsmonster
logs to parquet files.
Configuration Parameters
[parquet_output]
; What should be written to parquet file. options:
; 0: Disable Output
; 1: Enable Output without any filters
; 2: Enable Output and apply skipdomains logic
; 3: Enable Output and apply allowdomains logic
; 4: Enable Output and apply both skip and allow domains logic
parquetoutputtype = 0
; Path to output folder. Used if parquetoutputtype is not none
parquetoutputpath =
; Number of records to write to parquet file before flushing
parquetflushbatchsize = 10000
; Number of workers to write to parquet file
parquetworkercount = 4
; Size of the write buffer in bytes
parquetwritebuffersize = 256000
4.3 - ClickHouse
ClickHouse is a time-series database engine developed by Yandex. It uses a column-oriented design which makes it a good candidate to store hundreds of thousands of DNS queries per second with extremely good compression ratio as well as fast retrieval of data.
Currently, dnsmonster
’s implementation requires the table name to be set to DNS_LOG. An SQL schema file is provided by the repository under the clickhouse
directory. The Grafana dashboard and configuration set provided by dnsmonster
also corresponds with the ClickHouse schema and can be used to visualize the data.
configuration parameters
--clickhouseAddress
: Address of the ClickHouse database to save the results (default: localhost:9000)
--clickhouseUsername
: Username to connect to the ClickHouse database (default: empty)
--clickhousePassword
: Password to connect to the ClickHouse database (default: empty)
--clickhouseDatabase
: Database to connect to the ClickHouse database (default: default)
--clickhouseDelay
: Interval between sending results to ClickHouse (default: 1s)
--clickhouseDebug
: Debug ClickHouse connection (default: false)
--clickhouseCompress
: Compress ClickHouse connection (default: false)
--clickhouseSecure
: Use TLS for ClickHouse connection (default: false)
--clickhouseSaveFullQuery
: Save full packet query and response in JSON format. (default: false)
--clickhouseOutputType
: ClickHouse output type. Options: (default: 0)
- 0: Disable Output
- 1: Enable Output without any filters
- 2: Enable Output and apply skipdomains logic
- 3: Enable Output and apply allowdomains logic
- 4: Enable Output and apply both skip and allow domains logic
--clickhouseBatchSize
: Minimum capacity of the cache array used to send data to clickhouse. Set close to the queries per second received to prevent allocations (default: 100000)
--clickhouseWorkers
: Number of ClickHouse output Workers (default: 1)
--clickhouseWorkerChannelSize
: Channel Size for each ClickHouse Worker (default: 100000)
Note: the general option --skipTLSVerification
applies to this module as well.
Retention Policy
The default retention policy for the ClickHouse tables is set to 30 days. You can change the number by building the containers using ./autobuild.sh
. Since ClickHouse doesn’t have an internal timestamp, the TTL will look at incoming packet’s date in pcap
files. So while importing old pcap
files, ClickHouse may automatically start removing the data as they’re being written and you won’t see any actual data in your Grafana. To fix that, you can change TTL to a day older than your earliest packet inside the PCAP file.
NOTE: to manually change the TTL, you need to directly connect to the ClickHouse server using the clickhouse-client
binary and run the following SQL statements (this example changes it from 30 to 90 days):
ALTER TABLE DNS_LOG MODIFY TTL DnsDate + INTERVAL 90 DAY;`
NOTE: The above command only changes TTL for the raw DNS log data, which is the majority of your capacity consumption. To make sure that you adjust the TTL for every single aggregation table, you can run the following:
ALTER TABLE DNS_LOG MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_DOMAIN_COUNT` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_DOMAIN_UNIQUE` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_PROTOCOL` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_GENERAL_AGGREGATIONS` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_EDNS` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_OPCODE` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_TYPE` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_CLASS` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_RESPONSECODE` MODIFY TTL DnsDate + INTERVAL 90 DAY;
ALTER TABLE `.inner.DNS_SRCIP_MASK` MODIFY TTL DnsDate + INTERVAL 90 DAY;
UPDATE: in the latest version of clickhouse
, the .inner tables don’t have the same name as the corresponding aggregation views. To modify the TTL you have to find the table names in UUID format using SHOW TABLES
and repeat the ALTER
command with those UUIDs.
SAMPLE in clickhouse SELECT queries
By default, the main tables created by tables.sql (DNS_LOG
) file have the ability to sample down a result as needed, since each DNS question has a semi-unique UUID associated with it. For more information about SAMPLE queries in Clickhouse, please check out this document.
Useful queries
- List of unique domains visited over the past 24 hours
-- using domain_count table
SELECT DISTINCT Question FROM DNS_DOMAIN_COUNT WHERE t > Now() - toIntervalHour(24)
-- only the number
SELECT count(DISTINCT Question) FROM DNS_DOMAIN_COUNT WHERE t > Now() - toIntervalHour(24)
-- see memory usage of the above query in bytes
SELECT memory_usage FROM system.query_log WHERE query_kind='Select' AND arrayExists(x-> x='default.DNS_DOMAIN_COUNT', tables) ORDER BY event_time DESC LIMIT 1 format Vertical
-- you can also get the memory usage of each query by query ID. There should be only 1 result so we will cut it off at one to optimize performance
SELECT sum(memory_usage) FROM system.query_log WHERE initial_query_id = '8de8fe3c-d46a-4a32-83da-4f4ba4dc49e5' format Vertical
4.4 - Elasticsearch/OpenSearch
Elasticsearch is a full-text search engine and it’s used widely across a lot of security tools. dnsmonster
supports Elastic 7.x out of the box. The support for 6.x and 8.x has not been tested.
There is also a fork of Elasticsearch called Opendistro, later renamed to Opensearch. Both are compatible with 7.10.x Elastic, so it should also be supported too.
Configuration parameters
[elastic_output]
; What should be written to elastic. options:
; 0: Disable Output
; 1: Enable Output without any filters
; 2: Enable Output and apply skipdomains logic
; 3: Enable Output and apply allowdomains logic
; 4: Enable Output and apply both skip and allow domains logic
ElasticOutputType = 0
; elastic endpoint address, example: http://127.0.0.1:9200. Used if elasticOutputType is not none
ElasticOutputEndpoint =
; elastic index
ElasticOutputIndex = default
; Send data to Elastic in batch sizes
ElasticBatchSize = 1000
; Interval between sending results to Elastic if Batch size is not filled
ElasticBatchDelay = 1s
4.5 - InfluxDB
InfluxDB is a time series database used to store logs and metrics with high ingestion rate.
Configuration options
[influx_output]
; What should be written to influx. options:
; 0: Disable Output
; 1: Enable Output without any filters
; 2: Enable Output and apply skipdomains logic
; 3: Enable Output and apply allowdomains logic
; 4: Enable Output and apply both skip and allow domains logic
InfluxOutputType = 0
; influx Server address, example: http://localhost:8086. Used if influxOutputType is not none
InfluxOutputServer =
; Influx Server Auth Token
InfluxOutputToken = dnsmonster
; Influx Server Bucket
InfluxOutputBucket = dnsmonster
; Influx Server Org
InfluxOutputOrg = dnsmonster
; Minimum capacity of the cache array used to send data to Influx
InfluxOutputWorkers = 8
; Minimum capacity of the cache array used to send data to Influx
InfluxBatchSize = 1000
4.6 - Microsoft Sentinel
Microsoft Sentinel output module is designed to send dnsmonster
logs to Sentinel. In addition to that, this module supports sending the logs to any Log Analytics workspace no matter if they are connected to Sentinel or not.
Please take a look at Microsoft’s official documentation to see how Customer ID and Shared key are obtained.
Configuration Parameters
[sentinel_output]
; What should be written to Microsoft Sentinel. options:
; 0: Disable Output
; 1: Enable Output without any filters
; 2: Enable Output and apply skipdomains logic
; 3: Enable Output and apply allowdomains logic
; 4: Enable Output and apply both skip and allow domains logic
SentinelOutputType = 0
; Sentinel Shared Key, either the primary or secondary, can be found in Agents Management page under Log Analytics workspace
SentinelOutputSharedKey =
; Sentinel Customer Id. can be found in Agents Management page under Log Analytics workspace
SentinelOutputCustomerId =
; Sentinel Output LogType
SentinelOutputLogType = dnsmonster
; Sentinel Output Proxy in URI format
SentinelOutputProxy =
; Sentinel Batch Size
SentinelBatchSize = 100
; Interval between sending results to Sentinel if Batch size is not filled
SentinelBatchDelay = 1s
4.7 - Splunk HEC
Splunk HTTP Event Collector is a widely used component of Splunk to ingest raw and JSON data. dnsmonster
uses the JSON output to push the logs into a Splunk index. various configurations are also supported. You can also use multiple HEC endpoints to have load balancing and fault tolerance across multiple index heads. Note that the token and other settings are shared between multiple endpoints.
Configuration Parameters
[splunk_output]
; What should be written to HEC. options:
; 0: Disable Output
; 1: Enable Output without any filters
; 2: Enable Output and apply skipdomains logic
; 3: Enable Output and apply allowdomains logic
; 4: Enable Output and apply both skip and allow domains logic
SplunkOutputType = 0
; splunk endpoint address, example: http://127.0.0.1:8088. Used if splunkOutputType is not none, can be specified multiple times for load balanace and HA
SplunkOutputEndpoint =
; Splunk HEC Token
SplunkOutputToken = 00000000-0000-0000-0000-000000000000
; Splunk Output Index
SplunkOutputIndex = temp
; Splunk Output Proxy in URI format
SplunkOutputProxy =
; Splunk Output Source
SplunkOutputSource = dnsmonster
; Splunk Output Sourcetype
SplunkOutputSourceType = json
; Send data to HEC in batch sizes
SplunkBatchSize = 1000
; Interval between sending results to HEC if Batch size is not filled
SplunkBatchDelay = 1s
4.8 - Stdout, syslog or Log File
Stdout, syslog and file are supported outputs for dnsmonster
out of the box. They are useful specially if you have a SIEM agent reading the files as they come in. Note that dnsmonster
does not provide support for log rotation and the capacity of the hard drive while writing into a file. You can use a tool like logrotate
to perform cleanups on the log files. The signalling on log rotation (SIGHUP) has not been tested with dnsmonster
.
The JSON schema used to send the logs can be configured to be compatible with Open Cybersecurity Schema Framework (OCSF) as well.
Currently, Syslog output is only supported on Linux.
Configuration parameters
[file_output]
; What should be written to file. options:
; 0: Disable Output
; 1: Enable Output without any filters
; 2: Enable Output and apply skipdomains logic
; 3: Enable Output and apply allowdomains logic
; 4: Enable Output and apply both skip and allow domains logic
FileOutputType = 0
; Path to output file. Used if fileOutputType is not none
FileOutputPath =
; Output format for file. options:json, json-ocsf, csv, csv_no_header, gotemplate. note that the csv splits the datetime format into multiple fields
FileOutputFormat = json
; Go Template to format the output as needed
FileOutputGoTemplate = {{.}}
[stdout_output]
; What should be written to stdout. options:
; 0: Disable Output
; 1: Enable Output without any filters
; 2: Enable Output and apply skipdomains logic
; 3: Enable Output and apply allowdomains logic
; 4: Enable Output and apply both skip and allow domains logic
StdoutOutputType = 0
; Output format for stdout. options:json,csv, csv_no_header, gotemplate. note that the csv splits the datetime format into multiple fields
StdoutOutputFormat = json
; Go Template to format the output as needed
StdoutOutputGoTemplate = {{.}}
; Number of workers
StdoutOutputWorkerCount = 8
[syslog_output]
; What should be written to Syslog server. options:
; 0: Disable Output
; 1: Enable Output without any filters
; 2: Enable Output and apply skipdomains logic
; 3: Enable Output and apply allowdomains logic
; 4: Enable Output and apply both skip and allow domains logic
SyslogOutputType = 0
; Syslog endpoint address, example: udp://127.0.0.1:514, tcp://127.0.0.1:514. Used if syslogOutputType is not none
SyslogOutputEndpoint = udp://127.0.0.1:514
4.9 - VictoriaLogs
VictoriaLogs output module is designed to send dnsmonster
logs to victorialogs.
Configuration Parameters
[victoria_output]
; Victoria Output Endpoint. example: http://localhost:9428/insert/jsonline?_msg_field=rcode_id&_time_field=time
victoriaoutputendpoint =
; What should be written to Microsoft Victoria. options:
; 0: Disable Output
; 1: Enable Output without any filters
; 2: Enable Output and apply skipdomains logic
; 3: Enable Output and apply allowdomains logic
; 4: Enable Output and apply both skip and allow domains logic
victoriaoutputtype = 0
; Victoria Output Proxy in URI format
victoriaoutputproxy =
; Number of workers
victoriaoutputworkers = 8
; Victoria Batch Size
victoriabatchsize = 100
; Interval between sending results to Victoria if Batch size is not filled. Any value larger than zero takes precedence over Batch Size
victoriabatchdelay = 0s
4.10 - Zinc Search
Zinc Search output module is designed to send dnsmonster
logs to zincsearch.
Configuration Parameters
[zinc_output]
; What should be written to zinc. options:
; 0: Disable Output
; 1: Enable Output without any filters
; 2: Enable Output and apply skipdomains logic
; 3: Enable Output and apply allowdomains logic
; 4: Enable Output and apply both skip and allow domains logic
zincoutputtype = 0
; index used to save data in Zinc
zincoutputindex = dnsmonster
; zinc endpoint address, example: http://127.0.0.1:9200/api/default/_bulk. Used if zincOutputType is not none
zincoutputendpoint =
; zinc username, example: [email protected]. Used if zincOutputType is not none
zincoutputusername =
; zinc password, example: password. Used if zincOutputType is not none
zincoutputpassword =
; Send data to Zinc in batch sizes
zincbatchsize = 1000
; Interval between sending results to Zinc if Batch size is not filled
zincbatchdelay = 1s
; Zing request timeout
zinctimeout = 10s
4.11 - PostgreSQL
PostgreSQL is regarded as the world’s most advanced open source database. dnsmonster
has experimental support to output to postgreSQL and any other compatible database engines (CockroachDB).
Configuration options
# [psql_output]
# What should be written to Microsoft Psql. options:
# 0: Disable Output
# 1: Enable Output without any filters
# 2: Enable Output and apply skipdomains logic
# 3: Enable Output and apply allowdomains logic
# 4: Enable Output and apply both skip and allow domains logic
--psqlOutputType=0
# Psql endpoint used. must be in uri format. example: postgres://username:password@hostname:port/database?sslmode=disable
--psqlEndpoint=
# Number of PSQL workers
--psqlWorkers=1
# Psql Batch Size
--psqlBatchSize=1
# Interval between sending results to Psql if Batch size is not filled. Any value larger than zero takes precedence over Batch Size
--psqlBatchDelay=0s
# Timeout for any INSERT operation before we consider them failed
--psqlBatchTimeout=5s
# Save full packet query and response in JSON format.
--psqlSaveFullQuery
4.12 - Metrics
Each enabled input and output comes with a set of metrics in order to monitor performance and troubleshoot your running instance. dnsmonster
uses the go-metrics library which makes it easy to register metrics on the fly and in a modular way.
currently, three metric outputs are supported:
Configuration parameters
[metric]
; Metric Endpoint Service. Choices: stderr, statsd, prometheus
MetricEndpointType = stderr
; Statsd endpoint. Example: 127.0.0.1:8125
MetricStatsdAgent =
; Prometheus Registry endpoint. Example: http://0.0.0.0:2112/metric
MetricPrometheusEndpoint =
; Interval between sending results to Metric Endpoint
MetricFlushInterval = 10s
5 - Tutorials
Some Design Templates
All-In-One Test Environment
Above diagram shows the overview of the autobuild output. running ./autobuild.sh
creates multiple containers:
- a
dnsmonster
container per selected interfaces from the host to look at the raw traffic. Host’s interface list will be prompted when running autobuild.sh
, allowing you to select one or more interfaces.
*a clickhouse
container to collect dnsmonster
’s outputs and save all the logs and data to their respective directory inside the host. Both paths will be prompted in autobuild.sh
. The default tables and TTL for them will implemented automatically.
- a
grafana
container connecting back to clickhouse
. It automatically sets up the connection to ClickHouse, and sets up the builtin dashboards based on the default ClickHouse tables. Note that Grafana container needs an internet connection to successfully set up the plugins. If you don’t have an internet connection, the dnsmonster
and clickhouse
containers will start working without any issues, and the error produced by Grafana can be ignored.
All-in-one Demo
5.1 - ClickHouse Cloud
use dnsmonster with ClickHouse Cloud
ClickHouse Cloud is a Serverless ClickHouse offering by the ClickHouse team. In this small tutorial I’ll go through the steps of building your DNS monitoring with it. At the time of writing this post, ClickHouse Cloud is in preview and some of the features might change over time.
Create a ClickHouse Cluster
First, let’s create a ClickHouse instance by signing up and logging into ClickHouse Cloud portal and clicking on “New Service” on the top right corner. You will be asked to provide a name and a region for your database. For the purpose of this tutorial, I will put the name of the database as dnsmonster
in us-east-2
region. There’s a good chance that other parameters will be present when you define your cluster such as size and number of servers, but overall everything should look pretty much the same
After clicking on create, you’ll see the connection settings for your instance. the default username to login is default
and the password is generated randomly. Save that password for a later use since the portal won’t show it forever!
And that’s it! You have a fully managed ClickHouse cluster running in AWS. Now let’s create our tables and views using the credentials we just got.
when you checkout dnsmonster
repository from GitHub, there is a replicated table file with the table definitions suited for ClickHouse cloud. note that the “traditional” table design won’t work on ClickHouse cloud since the managed cluster won’t allow non-replicated tables. This policy has been put in place to ensure the high availability and integrity of the tables’ data. Download the .sql
file and save it anywhere on your disk.
for example, /tmp/tables_replicated.sql
. Now let’s use clickhouse-client
tool to create the tables.
clickhouse-client --host INSTANCEID.REGION.PROVIDER.clickhouse.cloud --secure --port 9440 --password RANDOM_PASSWORD --multiquery < /tmp/tables_replicated.sql
replace the all caps variables with your server instance and this should create your primary tables. Everything should be in place for us to use dnsmonster
. Now we can point the dnsmonster
service to the ClickHouse instance and it should work without any issues.
dnsmonster --devName lo \
--packetHandlerCount 8 \
--clickhouseAddress INSTANCEID.REGION.PROVIDER.clickhouse.cloud:9440 \
--clickhouseOutputType 1 \
--clickhouseBatchSize 7000 \
--clickhouseWorkers 16 \
--clickhouseSecure \
--clickhouseUsername default \
--clickhousePassword "RANDOM_PASSWORD" \
--clickhouseCompress \
--serverName my_dnsmonster \
--maskSize4 16 \
--maskSize6 64
Compressing the ClickHouse INSERT
connection (--clickhouseCompress
) will make it efficient and fast. I’ve gotten better result by turning it on. Keep in mind that the tweaking of the packetHandlerCount as well as number of ClickHouse workers, batch size etc. will have a major impact on the overall performance. In my test, I’ve been able to exceed ~250,000 packets per seconds easily on my fibre connection. Keep in mind that you can substitute command line arguments with environment variables or a config file. Refer to the Configuration section of the documents for more info.
Configuring Grafana and dashboards
Now that the data is being pushed into ClickHouse, you can leverage Grafana with the pre-built dashboard to help you gain visibility over your data. Let’s start with running an instance of Grafana in a docker container.
docker run --name dnsmonster_grafana -p 3000:3000 grafana/grafana:8.4.3
then browse to localhost:3000
with admin
as both username and password, and install the ClickHouse plugin for Grafana. There are two choices in Grafana store, so both of them should work file out of the box, I’ve tested Altinity plugin for ClickHouse but there’s also an official ClickHouse Grafana Plugin to choose from.
After installing the plugin, you can add your ClickHouse server as a datasource using the same address, port and the password you used to run dnsmonster. After connecting Grafana to ClickHouse, you can import the pre-built dashboard from here either via the GUI or the CLI. Once your dashboard is imported, you can point it to your datasource address and most panels should start showing data. most, but not all.
One final step to make sure everything is running smoothly, is to INSERT
the dictionaries. Download the 4 dictonary files located here either manually or by cloning the git repo. I’ll assume that they’re in your /tmp/
directory. Now let’s go back to clickhouse-client
and quickly make that happen
clickhouse-client --host INSTANCEID.REGION.PROVIDER.clickhouse.cloud --secure --port 9440 --password RANDOM_PASSWORD
CREATE DICTIONARY dns_class (Id Uint64, Name String) PRIMARY KEY Id LAYOUT(FLAT()) SOURCE(HTTP(url "https://raw.githubusercontent.com/mosajjal/dnsmonster/main/clickhouse/dictionaries/dns_class.tsv" format TSV)) LIFETIME(MIN 0 MAX 0)
CREATE DICTIONARY dns_opcode (Id Uint64, Name String) PRIMARY KEY Id LAYOUT(FLAT()) SOURCE(HTTP(url "https://raw.githubusercontent.com/mosajjal/dnsmonster/main/clickhouse/dictionaries/dns_opcode.tsv" format TSV)) LIFETIME(MIN 0 MAX 0)
CREATE DICTIONARY dns_response (Id Uint64, Name String) PRIMARY KEY Id LAYOUT(FLAT()) SOURCE(HTTP(url "https://raw.githubusercontent.com/mosajjal/dnsmonster/main/clickhouse/dictionaries/dns_response.tsv" format TSV)) LIFETIME(MIN 0 MAX 0)
CREATE DICTIONARY dns_type (Id Uint64, Name String) PRIMARY KEY Id LAYOUT(FLAT()) SOURCE(HTTP(url "https://raw.githubusercontent.com/mosajjal/dnsmonster/main/clickhouse/dictionaries/dns_type.tsv" format TSV)) LIFETIME(MIN 0 MAX 0)
And that’s about it. With above commands, the full stack of Grafana, ClickHouse and dnsmonster should work perfectly. No more managing ClickHouse clusters manually! You can also combine this with the Kubernetes tutorial and provide a cloud-native, serverless DNS monitoring platform at scale.
5.2 - Kubernetes
use dnsmonster to monitor Kubernetes DNS traffic
In this guide, I’ll go through the steps to inject a custom configuration into Kubernetes’ coredns
DNS server to provide a dnstap
logger, and set up a dnsmonster
pod to receive the logs, process them and send them to intended outputs.
dnsmonster deployment
In order to make dnsmonster
see the dnstap connection coming from coredns
pod, we need to create the dnsmonster
Service inside the same namespace (kube-system
or equivalent)
Warning
Avoid setting your services and pod names “dnsmonster”. Reason is, Kubernetes injects a few environment variables to your dnsmonster
instance with DNSMONSTER_
prefix, and the dnsmonster
binary will interpret those as an input command line.
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: dnsmonster-dnstap
name: dnsmonster-dnstap
namespace: kube-system
spec:
# change the replica count to how many you might need to comfortably ingest the data
replicas: 1
selector:
matchLabels:
k8s-app: dnsmonster-dnstap
template:
metadata:
labels:
k8s-app: dnsmonster-dnstap
spec:
containers:
- name: dnsm-dnstap
image: ghcr.io/mosajjal/dnsmonster:v0.9.3
args:
- "--dnstapSocket=tcp://0.0.0.0:7878"
- "--stdoutOutputType=1"
imagePullPolicy: IfNotPresent
ports:
- containerPort: 7878
---
apiVersion: v1
# https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/#creating-a-service
# as per above documentation, each service will have a unique IP address that won't change for the lifespan of the service
kind: Service
metadata:
name: dnsmonster-dnstap
namespace: kube-system
spec:
selector:
k8s-app: dnsmonster-dnstap
ports:
- name: dnsmonster-dnstap
protocol: TCP
port: 7878
targetPort: 7878
EOF
now we can get the static IP assigned to the service to use it in coredns custom ConfigMap. Note that since CoreDNS itself is providing DNS, it does not support FQDN as a dnstap endpoint.
SVCIP=$(kubectl get service dnsmonster-dnstap --output go-template --template='{{.spec.clusterIP}}')
locate and edit the coredns
config
Let’s try and see if we can see and manipulate configuration inside coredns pods. Using below command, we can get a list of running coredns containers
kubectl get pod --output yaml --all-namespaces | grep coredns
In above command, you should be able to see many objects associated with coredns, most notably, coredns-custom
. coredns-custom
ConfigMap allows us to customize coredns configuration file and enable builtin plugins for it. Many cloud providers have built coredns-custom
ConfigMap into the offering. Take a look at AKS, Oracle Cloud and DigitalOcean docs for more details.
in Amazon’s EKS, there’s no coredns-custom
. So the configuration needs to be edited on the main configuration file instead. On top of that, EKS will keep overriding your configuration with the “default” value through a DNS add-on. That add-on needs to be disabled for any customization in coredns configuration as well. Take a look at This issue for more information.
Below command has been tested on DigitalOcean managed Kubernetes
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
log.override: |
dnstap tcp://$SVCIP:7878 full
EOF
After running the above command, you will see the logs inside your dnsmonster pod. As commented in the yaml definitions, customizing the configuration parameters should be fairly straightforward.
6 - FAQ
Why should I use dnsmonster
I’ve broken this question into two. Why do you need to monitor your DNS, and why is dnsmonster a good choice to do so.
Do I need passive DNS capture capability
DNS is one of, if not the most prevalent indicators of compromise in all attacks. The vast majority of external communication of a malware or a backdoor (~92% according to Cisco) have some sort of DNS connectivity in their chain. Here are some great articles around the importance of DNS Security monitoring
Why dnsmonster specifically?
dnsmonster
is one of the few products supporting a wide range of inputs (pcap file, dnstap, live interface on Windows and *Nix, afpacket) and a variety of outputs with minimum configuration and maximum performance. It can natively send data to your favorite Database service or a Kafka topic, and has a builtin capability of sending its metrics to a metrics endpoint. Check out the full feature set of monster in the “Getting Started” section.
In addition, dnsmonster
also offers a fantastic performance by utilizing all CPU cores available on the machine, and has builtin buffers to cope with sudden traffic spikes.
Why did you name it dnsmonster
When I first tested dnsmonster
on a giant DNS pcap file (220+ Million DNS Queries and responses) and saw it outperform other products in the same category, I described it to one of my mates that it “devoured those packets like the cookie monster” and that’s how the monster within dnsmonster was born
What OS should I use to run DNSmonster
dnsmonster
will always offer first-class support for the modern Linux kernel (4.x) so it is recommended that you use dnsmonster on a modern Linux distribution. It can also be compiled for Windows, *BSD and Mac OS, but many of the performance tweaks will not work as well as they do for Linux.
For example, when dnsmonster
is build on non-Unix systems, it stops manipulating the JSON objects with sonic.
As for architecture, dnsmonster
builds successfully against arm7
, aarch64
and amd64
, but the performance benchmark has not been done to determine which architecture works best with it.
Why is dnsmonster is not working for me
There could be several reasons behind why dnsmonster
is not working. The best way to start troubleshooting dnsmonster is to have a Go compiler handy so you can build dnsmonster from the source and try the following:
- Try building the master branch and run it with stdoutOutput to see if there is any output
- Try running
dnsmonster
with or without afpacket support and various buffer sizes
- Use a different packet capture method than
dnsmonster
to see if the packets are visible (tcpdump
and netsniff-ng
are good examples)
- Try piping the packets from
tcpdump
to dnsmonster
with stdoutOutput and see if that makes any difference. like so:
sudo tcpdump -nni eth0 -w - | dnsmonster --pcapFile=- --stdoutOutputType=1
- Pay attention to the
port
variable if your DNS packets are being sent on a different port than 53. That parameter is different than BPF. Speaking of which, make sure your BPF is not too restrictive.
If none of the above works, feel free to open an issue with the details of your problem. If you are planning to attach a pcap
file as part of your issue, make sure to anonymize it
How do I upgrade between the version
Before the product hits 1.x.x, breaking changes between each release is expected. Read the release note between your current version and desired version one by one to see if you need to upgrade in increments or not.
After 1.x.x, the plan is to maintain backwards compatibility in major versions (eg every 1.x.x installation will work as part of an upgrade). However, that will not necessarily be the case for ClickHouse tables. Since ClickHouse is a fast moving product, there might be a need to change the schema of the tables regardless of dnsmonster
’s major release.
The JSON output fields, which is the basis for the majority of dnsmonster
outputs, is bound to Miekg’s dns library. The library seems to be fairly stable and have used the same data structure for years. For dnsmonster
, the plan is to maintain the JSON schema the same for each major release so SIEM parsers such as ASIM and CIM can maintain functionality. dnsmonster
also supports go-template
output similar to kubectl
and makes it easy to customize and standardize your output to cater for your needs.
How fast is dnsmonster
dnsmonster
have demonstrated 200,000 packets per second ingestion on a beefy server with ClickHouse being run on the same machine with SSD storage backend. Since then, the performance of dnsmonster
for both packet ingestion and output pipeline have been improved, to the point that you can ingest the same number of packets per second on a commodity laptop. I would say for the majority of use cases, dnsmonster
will not be the bottleneck of your data collection.
If you have a heavy workload that you have tested with dnsmonster
, I would be happy to receive your feedback and share the numbers with the community
Which output do I use
Depends. I would recommend sticking with the current toolset you have. Majority of organizations have built a syslog
or kafka
pipeline to get the data into the ingestion point, and both are fully supported by dnsmonster
. If you want to test the product and its output, you can use file
and stdout
quite easily. Keep in mind that for file
, you should consider your disk IO if you’re writing a ton of data into disk.
If you’re keen to build a new solution from scratch, I would recommend looking at ClickHouse. dnsmonster
was originally built with ClickHouse in mind, and ClickHouse remains one of the better tools to ingest DNS logs. Take a look at how CloudFlare is leveraging ClickHouse to monitor 1.1.1.1 here
Why am I dropping packets
There could be many reasons behind packet loss. I went through some of them with possible solutions in the performance section.
Is there a Slack or Discord I can join
Not yet. At the moment, the repo’s discussions is created for this purpose. If that proves to be less than ideal, I’m open to have a Discord/Slack/Telegram channel dedicated to dnsmonster
. Let me know!
How to contribute
I have broken contribution into different sections
- For security and bug disclosure, please visit
SECURITY.md
in the main repository to get more info on how to report vulnerabilities responsibly
- For bugfixes, please open an issue first, before submitting a PR. That way the other contributors know what is being worked on and there will be less duplicate work on bugfixes. Also, sometimes the bugfixes are more related to a particular client and there could be other mitigation other than changing the code
- For new features and Output modules, please raise an issue first. That way the work can be assigned and timeline-d for the next major release and we can get together in the discussions to set the requirements
There are also many //todo
comments in the code, feel free to take a stab at those.
Last but not least, this very documentation needs your help! On the right hand side of each page, you can see some helper links to raise an issue with a page, or propose an edit or even create a new page.