Prometheus Service Discovery using Puppet

Prometheus Service Discovery using Puppet

In this blog post I would like to share how we use some neat Puppet features and modules to install Prometheus and to let it automagically discover the services it should monitor.

Introduction

At Moxio we prefer to work smart instead of hard since there's more to life than just work. We also really value quality, peace of mind and a good night's sleep. This is why we have automated systems to constantly monitor and validate how the services we provide to our customers perform at all stages of the software development life cycle.

For operations and maintenance we had a monitoring solution in place that already was hugely beneficial and provided more insight. It still however required a human being to proactively interpret the data it collected to see if everything was still going A-okay. Recently we set out to improve this system by taking that process mostly out of our hands too.

We wanted something more modern and broadly adopted since the development and adoption of our existing solution, StatsD, seemed quite stagnant. Ideally it should provide some transition method so we could slowly migrate to the new system. We also wanted to be able to let the server actively detect and warn us when things are going wrong. This is where, after some comparison with other solutions, Prometheus seemed to be the best fit.

Setting up and configuring Prometheus and all of the relevant exporters can get quite involved and cumbersome without some infrastructure automation tool, probably more so than other monitoring solutions. It does however provide great flexibility, support and effectiveness without requiring tons of resources of your servers. Luckily we've already been using Puppet to automate our infrastructure for quite some time now and there was already a great Prometheus module available to get us up to speed quickly.

Service Discovery

Prometheus uses a polling system and as such needs to know where to find the data it needs to scrape. Whilst running it in parallel to our existing monitoring system we quickly came to the conclusion that we needed a way to let Prometheus figure that out for itself. There are several service discovery mechanisms available to achieve this, for example with data from your EC2 or Kubernetes cluster. That's at most only partly useful if you're running on mixed infrastructure platforms like us.

More platform agnostic service discovery mechanisms like Consul and Nerve are also supported but require extra services to be installed and maintained. Luckily we can also use file-based service discovery to plug in our own custom mechanism and that's where Puppets exported resources come into play!

Exported Resources

But what are exported resources, and how can they help? The description the documentation about exported resources provides should already give a hint:

An exported resource declaration specifies a desired state for a resource, and publishes the resource for use by other nodes. It does not manage the resource on the target system. Any node, including the node that exports it, can collect the exported resource and manage its own copy of it.

A diagram might help to make this a bit less abstract. We have a resource that is defined in the catalog of node-a and node-b, we collect these resources on node-c where they get realized with the specified definition:

 

Note: For this to work you need PuppetDB installed and enable the storeconfigs option on your Puppet master.

The missing link...

So, to recapitulate, the idea is to use Puppets exported resources to supply the file-based service discovery mechanism with targets Prometheus should monitor. For this we have to generate JSON or YAML files containing a list of targets as documented. It is also possible to add custom labels here, for example, an environment label so that we can easily differentiate between production, staging and development servers.

We could simply create a single file for each target we have, use a wildcard in our filename pattern and be done with it. Exporters are highly specialized and often we will have multiple exporters running per node, this means we will quickly end up with a significant amount of targets and thus little configuration files. Instead we are going to create a single YAML file that contains a list of targets and the shared labels that apply to them.

Having multiple resources say something about a single part of a system is generally not a good idea however and something the Puppet language tries to actively prevent you from doing. The very useful and easy to use concat module has our backs though and allows us to define the contents of single a file with multiple resources called fragments in a safe way, check out the documentation if you want to know more.

Tying it all together

So how do all of these pieces fit together in code? I'm omitting the roles layer to keep it a bit more concise but i'll give you an example using the excellent roles and profiles method.

Our inventory: site.pp

node 'monitor01' {
  include profile::prometheus::server
  include profile::prometheus::node_exporter
}

node 'database01' {
  include profile::prometheus::node_exporter
}

node 'webserver01' {
  include profile::prometheus::node_exporter
}

How we setup and configure the prometheus server

class profile::prometheus::server {
  # Install and configure the prometheus server..
  class { 'prometheus::server': 
    scrape_configs => [
      # Here we configure our job and how it discovers its targets
      {
        'job_name' => 'node',
        'file_sd_configs' => [
          {
            'files' => ['/etc/prometheus/node-targets.yaml']
          }
        ], 
      },
    ]
  }

  # The YAML file that will contain a list of all of the node targets..
  concat { '/etc/prometheus/node-targets.yaml':
    ensure_newline => true,
    owner => 'prometheus',
    group => 'prometheus',
    mode  => '0660',
    require => Class['prometheus::server'],
  }

  # We need this to make sure it will become a valid YAML file, so this needs to be at the beginning of the file.
  # This also where we could add custom labels
  concat::fragment { 'node-targets-header':
    target  => $job_targets_file,
    content => "---\n- labels:\n  environment: production\n- targets:\n"
    order => 0,
  }

  # This is where the magic happens and how we collect the resources exported by the other nodes! \o/
  Profile::Prometheus::Target <<| |>>
}

The resource that does all the magic, it gets exported and collected

define profile::prometheus::target (
  String[1] $job,
  String[1] $host,
) {
  concat::fragment { "${job} target ${host}":
    target  => "/etc/prometheus/${job}-targets.yaml",
    content => "  - ${host}",
  }
}

We define and export the target resource in the class where we manage the to be monitored exporter

class profile::prometheus::node_exporter {
  include prometheus::node_exporter

  $fqdn = $::facts['networking']['fqdn']
  @@profile::prometheus::target { "${fqdn} - node_exporter':
    job  => 'node',
    host => "${fqdn}:9100",
  }
}

It adds some complexity but saves a lot of effort in the long run. A diagram to help get an overview of how it all fits together:

 

Closing words

We have been using Prometheus for a few months now with great success. Puppet really helps tremendously with maintaining everything. Combined with specialized exporters and the alertmanager we now have powerfull system that proactively monitors and warns us when things go wrong, giving us more peace of mind and time for other things we love to do.

Also huge thanks to Voxpopuli that maintain the prometheus puppet module. We made some bugfixes and implemented support for the Apache exporter, it was an easy proces with good feedback to get it all merged. Consider making time to contribute back to the opensource software you use if you have the means to do so.

Wiebe Verweij

Wiebe Verweij