2 of 52

About me

I’m Aliaksandr Valialkin, CTO and core developer at VictoriaMetrics
I like programming in Go
I’m fond of performance optimizations
I like ClickHouse :)
Follow me @GitHub - https://github.com/valyala

3 of 52

What is VictoriaMetrics?

Easy to use time series database and monitoring solution
Can be used as drop-in replacement for Prometheus and Graphite
Provides monitoring-optimized query language - MetricsQL (inspired by PromQL)
Optimized for low resource usage (disk space, disk IO, CPU, RAM)
Optimized for high performance

4 of 52

The history of VictoriaMetrics

5 of 52

The history of VictoriaMetrics: ad analytics and PostgreSQL

Long time ago we were using PostgreSQL for ad analytics

PostgreSQL

Ad click events

Ad view events

Analytical reports

6 of 52

The history of VictoriaMetrics: ad analytics and PostgreSQL

Long time ago we were using PostgreSQL for ad analytics
PostgreSQL was great until the stream of the stored events start exceeding 100K rows/sec

PostgreSQL

Ad click events

Ad view events

Analytical reports

I’m slow :(

100K events/sec

7 of 52

The history of VictoriaMetrics: ad analytics and PostgreSQL

We squeezed the maximum possible data ingestion speed from PostgreSQL: https://gist.github.com/valyala/ae3cbfa4104f1a022a2af9b8656b1131

8 of 52

The history of VictoriaMetrics: ad analytics and PostgreSQL

There were issues with query performance as well
These issues were solved by creating many tables with aggregate analytics
The number of different aggregate tables was constantly growing...
...as well as the time needed for aggregate calculations

9 of 52

The history of VictoriaMetrics: ad analytics and PostgreSQL

We started searching for a new database for ad analytics
CitusDB looked promising. But it was hard to setup and operate
Postgres-XL was even harder to setup and operate
We tried MemSQL. It failed because of high RAM usage :(

10 of 52

The history of VictoriaMetrics: ad analytics and PostgreSQL

We started searching for a new database for ad analytics
CitusDB looked promising. But it was hard to setup and operate
Postgres-XL was even harder to setup and operate
We tried MemSQL. It failed because of high RAM usage :(
Then we discovered ClickHouse!

11 of 52

ClickHouse for ad analytics

12 of 52

The history of VictoriaMetrics: ad analytics and ClickHouse

ClickHouse fit perfectly for our use case!
We successfully migrated ad analytics from PostgreSQL to ClickHouse in one month
ClickHouse was accepting up to 300K events per second on a single server with 50 columns per event
It easily scaled to a cluster of a dozen of nodes, which was accepting 3M events/sec without issues
ClickHouse could scan up to 1000x more rows per second than PostgreSQL on the same hardware!
We dropped the majority of the pre-computed aggregate tables, since ClickHouse could calculate arbitrary reports for our ad analytics over raw events in a blink of an eye!

13 of 52

The history of VictoriaMetrics: ad analytics and ClickHouse

We were happy users of ClickHouse, so we decided to give back to community

14 of 52

The history of VictoriaMetrics: ad analytics and ClickHouse

We were happy users of ClickHouse, so we decided to give back to community
We created a datasource for Grafana, which allowed building graphs and reports in Grafana from ClickHouse data. Now it is supported by Altinity - https://github.com/Vertamedia/clickhouse-grafana

15 of 52

The history of VictoriaMetrics: ad analytics and ClickHouse

We were happy users of ClickHouse, so we decided to give back to community
We created a datasource for Grafana, which allowed building graphs and reports in Grafana from ClickHouse data. Now it is supported by Altinity - https://github.com/Vertamedia/clickhouse-grafana
Then we created a proxy, which could be used for authorizing and load-balancing of insert and select requests among ClickHouse clusters, replicas and nodes - https://github.com/Vertamedia/chproxy

16 of 52

The history of VictoriaMetrics: ad analytics and ClickHouse

We were happy users of ClickHouse, so we decided to give back to community
We created a datasource for Grafana, which allowed building graphs and reports in Grafana from ClickHouse data. Now it is supported by Altinity - https://github.com/Vertamedia/clickhouse-grafana
Then we created a proxy, which could be used for authorizing and load-balancing of insert and select requests among ClickHouse clusters, replicas and nodes - https://github.com/Vertamedia/chproxy
Then I created a fast reader of TSV data from ClickHouse - https://github.com/valyala/tsvreader

17 of 52

The history of VictoriaMetrics: ad analytics and ClickHouse

We were happy users of ClickHouse, so we decided to give back to community
We created a datasource for Grafana, which allowed building graphs and reports in Grafana from ClickHouse data. Now it is supported by Altinity - https://github.com/Vertamedia/clickhouse-grafana
Then we created a proxy, which could be used for authorizing and load-balancing of insert and select requests among ClickHouse clusters, replicas and nodes - https://github.com/Vertamedia/chproxy
Then I created a fast reader of TSV data from ClickHouse - https://github.com/valyala/tsvreader
I filed a few feature requests and bug reports at ClickHouse repository - https://github.com/ClickHouse/ClickHouse/issues?q=is%3Aissue+author%3Avalyala

19 of 52

The history of VictoriaMetrics: Prometheus and ClickHouse

We were using Zabbix for infrastructure and application monitoring
The experience wasn’t very good, so we were searching for a replacement
We discovered Prometheus in the beginning of 2017
It was great!
Unfortunately it started to slow down with the increased number of monitored apps
So we decided to try ClickHouse as a remote storage for Prometheus

20 of 52

ClickHouse as data storage for Prometheus

21 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

Prometheus collects samples with the following structure:

http_requests_total{path=”/foo”,job=”webserver”,instance=”host-1:80”} 12 34567

22 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

Prometheus collects samples with the following structure:

http_requests_total{path=”/foo”,job=”webserver”,instance=”host-1:80”} 12 34567

metric_name

labels

value

timestamp

23 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

Let’s store Prometheus samples in the following ClickHouse table:

samples (

metric_name String,

labels String,

timestamp Int64,

value Float64

) ORDER BY (metric_name, labels, timestamp)

24 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

Let’s store Prometheus samples in the following ClickHouse table:

samples (

metric_name String,

labels String,

timestamp Int64,

value Float64

) ORDER BY (metric_name, labels, timestamp)

This allows quickly locating samples for a particular metric
This allows searching for samples with particular labels via regexp matching. But this isn’t an easy and fast task
`metric_name` and `labels` are duplicated with every sample

25 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

Let’s move `metric_name` and `labels` columns into a separate table:

metric_to_series (metric_name String, labels String, series_id Uint64) ORDER BY (metric_name, labels)

26 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

Let’s move `metric_name` and `labels` columns into a separate table:

metric_to_series (metric_name String, labels String, series_id Uint64) ORDER BY (metric_name, labels)

Then the `samples` table will look like:

samples (series_id Uint64, timestamp Int64, value Float64) ORDER BY (series_id, timestamp)

This reduces the overhead for storing `metric_name` and `labels` per each sample

27 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

Let’s move `metric_name` and `labels` columns into a separate table:

metric_to_series (metric_name String, labels String, series_id Uint64) ORDER BY (metric_name, labels)

Then the `samples` table will look like:

samples (series_id Uint64, timestamp Int64, value Float64) ORDER BY (series_id, timestamp)

This reduces the overhead for storing `metric_name` and `labels` per each sample
But it is still hard to search by labels

28 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

Let’s introduce an additional table:

label_to_series (label_name_value String, series_id Uint64) ORDER BY (label_name_value, series_id)

The `label_name_value` column contains `label=value` strings
This table simplifies and speeds up searching by labels
Such table is known as “inverted index”

29 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

The `samples` table can be optimized further with special codecs for timestamp and value:

samples (timestamp Int64 Codec(DoubleDelta), value Float64 Codec(Gorilla))

These codecs reduce disk space usage for typical time series data

30 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

The `samples` table can be optimized further with special codecs for timestamp and value:

samples (timestamp Int64 Codec(DoubleDelta), value Float64 Codec(Gorilla))

These codecs reduce disk space usage for typical time series data
Unfortunately these codecs weren’t available in ClickHouse in 2017 :(
These codecs were added after the feature request - https://github.com/ClickHouse/ClickHouse/issues/838

31 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

The `series` and `label_to_series` tables also can be optimized further by using LowCardinality codecs:

series (metric_name LowCardinality(String), labels String, series_id Uint64)

label_to_series (label_name_value LowCardinality(String), series_id Uint64)

This saves disk space and improves query performance

32 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

The `series` and `label_to_series` tables also can be optimized further by using LowCardinality codecs:

series (metric_name LowCardinality(String), labels String, series_id Uint64)

label_to_series (label_name_value LowCardinality(String), series_id Uint64)

This saves disk space and improves query performance
But LowCardinality codec wasn’t available in ClickHouse in 2017 :(
It was added as a response to the feature request - https://github.com/ClickHouse/ClickHouse/issues/1567

33 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

Further optimizations:

The `metric_name` column can be encoded as a label with a special name - `__name__` (Prometheus does this)
There should be a table for fast lookup of metric_name{labels} from `series_id` during queries

34 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

The final database structure would consist of the following tables:

metric_to_series (metric_name_with_labels String, series_id Uint64) ORDER BY (metric_name_with_labels, labels)

This table is used by the app, which accepts new samples and needs to determine the corresponding series_id per each incoming sample

35 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

The final database structure would consist of the following tables:

metric_to_series (metric_name_with_labels String, series_id Uint64) ORDER BY (metric_name_with_labels, labels)

series_to_metric (series_id Uint64, metric_name_with_labels String) ORDER BY (series_id)

This table is needed for converting the found series_id to human-readable format metric_name{labels} during queries

36 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

The final database structure would consist of the following tables:

metric_to_series (metric_name_with_labels String, series_id Uint64) ORDER BY (metric_name_with_labels, labels)

series_to_metric (series_id Uint64, metric_name_with_labels String) ORDER BY (series_id)

label_to_series (label_name_value LowCardinality(String), series_id Uint64) ORDER BY (label_name_value, series_id)

This table is used for fast lookups of series_id for the given label filters during queries. For example, http_requests_total{job=”webserver”} should find series_id values for series with both {__name__=”http_requests_total”} and {job=”webserver”} labels

37 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

The final database structure would consist of the following tables:

metric_to_series (metric_name_with_labels String, series_id Uint64) ORDER BY (metric_name_with_labels, labels)

series_to_metric (series_id Uint64, metric_name_with_labels String) ORDER BY (series_id)

label_to_series (label_name_value LowCardinality(String), series_id Uint64) ORDER BY (label_name_value, series_id)

samples (series_id Uint64, timestamp Int64 Codec(DoubleDelta), value Float64 Codec(Gorilla)) ORDER BY (series_id, timestamp)

This table stores (timestamp, value) pairs for the ingested samples

38 of 52

The history of VictoriaMetrics: ClickHouse as metrics storage

This approach looks good
But it has the following issues:

It needs an external app for collecting the incoming samples and quickly adding missing entries to `metric_to_series`, `series_to_metric` and `label_to_series` tables. The app should be able to buffer incoming data in order to reduce the frequency of inserts to ClickHouse
It needs an external app for providing Prometheus-compatible querying API
This increases operational complexity
The on-disk compression level can be improved further

39 of 52

Meet VictoriaMetrics!

40 of 52

Meet VictoriaMetrics!

So we decided to create a specialized time series database from scratch in order to solve the mentioned issues
It should meet the following requirements:

It must be fast
It must be easy to setup and operate
It must scale both vertically (more CPU and RAM) and horizontally (multiple nodes)
It must be easy to code

41 of 52

Meet VictoriaMetrics!

So we decided to create a specialized time series database from scratch in order to solve the mentioned issues
It should meet the following requirements:

It must be fast
It must be easy to setup and operate
It must scale both vertically (more CPU and RAM) and horizontally (multiple nodes)
It must be easy to code

I don’t like C++ because of its complexity, but I like writing fast code in Go
So VictoriaMetrics is written in Go :)

42 of 52

Meet VictoriaMetrics!

Conceptually VictoriaMetrics uses the same database scheme discussed previously:

metric_to_series (metric_name_with_labels string, series_id uint64)
series_to_metric (series_id uint64, metric_name_with_labels string)
label_to_series (label_name_value string, series_id uint64)
samples (series_id uint64, timestamp int64, value float64)

But it doesn’t use external libraries or apps for the database
It implements specially optimized persistent data structures for the given tables

43 of 52

VictoriaMetrics: persistent data structures

The first three tables - `metric_to_series`, `series_to_metric` and `label_to_series` - are stored in a mergeset (aka indexdb). It stores sorted strings. It is optimized for fast data insertion, fast lookups and fast range scans by string prefix
The last table - `samples` - is stored in a separate data structure similar to MergeTree in ClickHouse

44 of 52

VictoriaMetrics: data ingestion path

VictoriaMetrics accepts samples in various formats: InfluxDB, Graphite, OpenTSDB, Prometheus, DataDog, CSV, JSON, etc.
The ingested samples are buffered in memory
The indexdb is dynamically updated when samples with new metric_name{labels} are ingested into the database
The buffered samples are converted to ClickHouse-like parts and flushed to disk every second

45 of 52

VictoriaMetrics: `samples` table internals

A table is split into per-month partitions. This allows instant removal of the data outside the retention period
Each partition consists of ClickHouse-like parts, which are merged in background into bigger parts
Each part is split into blocks
Each block contains samples only for a single time series. Samples are sorted by timestamp in each block
Block size is limited to 8K samples, so it can fit CPU cache for the max processing speed
Blocks are processed independently of each other, so they can be processed in parallel
`timestamp` and `value` columns are encoded separately with the most efficient codecs in order to get the maximum compression rate - https://faun.pub/victoriametrics-achieving-better-compression-for-time-series-data-than-gorilla-317bc1f95932

46 of 52

VictoriaMetrics: query path

Query is split into two steps:

Selecting time series matching the given label filters on the given time range
Processing samples for the selected time series according to the query

47 of 52

VictoriaMetrics: query path

Query is split into two steps:

Selecting time series matching the given label filters on the given time range
Processing samples for the selected time series according to the query

Matching time series are searched via `label_to_series` table. The table may contain billions of entries, so VictoriaMetrics uses various optimization tricks (composite index, per-day index, optimized bitset, unpacked data cache, search cache) in order to speed up the search

48 of 52

VictoriaMetrics: query path

Query is split into two steps:

Selecting time series matching the given label filters on the given time range
Processing samples for the selected time series according to the query

Matching time series are searched via `label_to_series` table. The table may contain billions of entries, so VictoriaMetrics uses various optimization tricks (composite index, per-day index, optimized bitset, unpacked data cache, search cache) in order to speed up the search
Then VictoriaMetrics unpacks and processes blocks of samples for the found time series on the given time range. Blocks are unpacked and processed in parallel on all the available CPU cores in order to improve query performance

49 of 52

VictoriaMetrics: query path

VictoriaMetrics implements MetricsQL - query language inspired by PromQL - https://docs.victoriametrics.com/MetricsQL.html
Typical queries used in monitoring are easier to write in MetricsQL than in SQL - https://valyala.medium.com/promql-tutorial-for-beginners-9ab455142085
VictoriaMetrics implements the following query APIs:

Prometheus API - https://docs.victoriametrics.com/#prometheus-querying-api-usage
Graphite API - https://docs.victoriametrics.com/#graphite-api-usage

VictoriaMetrics can be used as drop-in replacement for Prometheus and Graphite

50 of 52

VictoriaMetrics: horizontal scalability

51 of 52

VictoriaMetrics: the end result

Fast time series database and monitoring solution inspired by ClickHouse
Easy to setup and operate
Integrates with Prometheus, Grafana, InfluxDB, Graphite, DataDog and OpenTSDB
Scales vertically and horizontally
Successfully handles tens of trillions (e.g. more than 10^13) of samples per node in production - https://docs.victoriametrics.com/CaseStudies.html

1 of 52

2 of 52

3 of 52

4 of 52

5 of 52

6 of 52

7 of 52

8 of 52

9 of 52

10 of 52

11 of 52

12 of 52

13 of 52

14 of 52

15 of 52

16 of 52

17 of 52

18 of 52

19 of 52

20 of 52

21 of 52

22 of 52

23 of 52

24 of 52

25 of 52

26 of 52

27 of 52

28 of 52

29 of 52

30 of 52

31 of 52

32 of 52

33 of 52

34 of 52

35 of 52

36 of 52

37 of 52

38 of 52

39 of 52

40 of 52

41 of 52

42 of 52

43 of 52

44 of 52

45 of 52

46 of 52

47 of 52

48 of 52

49 of 52

50 of 52

51 of 52

52 of 52