An intro to metrics collection with Prometheus and visualization with Grafana at the Colorado School of Mines
MINES.EDU
About Us
Colorado School of Mines and the CIARC group
MINES.EDU
Mines numbers
CIARC
Resources
Wendian
78 Intel Skylake CPU nodes + 5 Intel with GPU nodes + 4 Power nodes
874TB BeeGFS filesystem
AuN
144 Intel Sandy Bridge CPU nodes
350TB GPFS filesystem
Mio
202 Intel nodes – several generations
2 Intel nodes with GPU
2 Power8 nodes with GPU
176TB GPFS filesystem
Orebits
800TB ZFS – exported to campus with SMB or NFS
Motivation
Can anything replace Ganglia?
MINES.EDU
Why replace Ganglia?
MINES.EDU
A side by side comparison
Ganglia
Grafana
MINES.EDU
Introduction
Architecture overview and some definitions of terms
MINES.EDU
Architecture
Credit: https://prometheus.io/assets/architecture.png
MINES.EDU
Data gathering
Exporters – Run as a service and provide access to a collection of metrics as JSON data when queried
Metric – Any time series data. Types are counter, gauge, histogram, and summary.
Label - Optional key-value pairs that are added to metrics to give dimensionality to data
Pushgateway – A service that allows for pushing metrics from ephemeral and batch jobs. It is not meant to turn Prometheus into a push-based system.
MINES.EDU
Sampling of exporters from:�https://prometheus.io/docs/instrumenting/exporters/
MINES.EDU
Data Collection
Prometheus – The server that collects metrics from exporters and stores in the TSDB
Service discovery – A way to dynamically discover services to be scraped. Most useful in dynamic cloud environments.
TSDB – Time Series Database. Prometheus custom on disk storage for metric streams
Scraping – The retrieval of streams of metrics from the targets which is usually an exporter service running on a remote host.
MINES.EDU
Alerting
MINES.EDU
Data viewing
MINES.EDU
Grafana plugins - https://grafana.com/grafana/plugins
MINES.EDU
Installation
Exporters, Prometheus Server, and Grafana
MINES.EDU
node_exporter
MINES.EDU
Prometheus Server
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target
[Service]
User=prometheus
Restart=on-failure
#Change this line if you download the
#Prometheus on different path user
ExecStart=/home/prometheus/prometheus/prometheus \
--config.file=/home/prometheus/prometheus/prometheus.yml \
--storage.tsdb.path=/home/prometheus/prometheus/data
[Install]
WantedBy=multi-user.target
MINES.EDU
Grafana server
MINES.EDU
Configuration and Customization
MINES.EDU
Typical workflow overview
MINES.EDU
Add exporter targets to Prometheus
- job_name: 'compute'
file_sd_configs:
- files:
- /etc/prometheus/nodes/*.yml
[
{
targets: [ "c001:9100" ],
"labels": {
"cluster": "wendian",
"host": "c001",
"role": "compute",
}
},
]
prometheus.yml (excerpt)
/etc/prometheus/nodes/c001.yml
MINES.EDU
Add data source to Grafana
MINES.EDU
Import a dashboard
MINES.EDU
Let’s see a demo…