1 of 96

Go optimizations in VictoriaMetrics

Aliaksandr Valialkin, CTO at VictoriaMetrics

2 of 96

About me

  • https://github.com/valyala
  • I’m fond of Go and performance optimizations
  • Fasthttp author
  • VictoriaMetrics core developer

3 of 96

Agenda

  • What is VictoriaMetrics?
  • What is time series?
  • What does time series database do?
  • Time series database architecture
  • Inverted index implementations, issues and optimizations
  • Specialized bitset implementation in Go

4 of 96

What is VictoriaMetrics?

5 of 96

What is VictoriaMetrics?

  • Time series database

6 of 96

What is VictoriaMetrics?

  • Time series database
  • It is based on architecture ideas from ClickHouse
    • MergeTree data structure
    • Parallel computations on all the CPU cores
    • Process data in blocks that fit CPU cache

7 of 96

What is VictoriaMetrics?

  • Time series database
  • It is based on architecture ideas from ClickHouse
    • MergeTree data structure
    • Parallel computations on all the CPU cores
    • Process data in blocks that fit CPU cache
  • Provides the best on-disk data compression

8 of 96

What is VictoriaMetrics?

  • Time series database
  • It is based on architecture ideas from ClickHouse
    • MergeTree data structure
    • Parallel computations on all the CPU cores
    • Process data in blocks that fit CPU cache
  • Provides the best on-disk data compression
  • Scales vertically and horizontally

9 of 96

What is VictoriaMetrics?

  • Time series database
  • It is based on architecture ideas from ClickHouse
    • MergeTree data structure
    • Parallel computations on all the CPU cores
    • Process data in blocks that fit CPU cache
  • Provides the best on-disk data compression
  • Scales vertically and horizontally
  • Fast

10 of 96

What is VictoriaMetrics?

  • Time series database
  • It is based on architecture ideas from ClickHouse
    • MergeTree data structure
    • Parallel computations on all the CPU cores
    • Process data in blocks that fit CPU cache
  • Provides the best on-disk data compression
  • Scales vertically and horizontally
  • Fast
  • Written in Go

11 of 96

What is time series?

12 of 96

What is time series?

  • A series of (timestamp, value) pairs

13 of 96

What is time series?

  • A series of (timestamp, value) pairs
  • Pairs are ordered by timestamp

14 of 96

What is time series?

  • A series of (timestamp, value) pairs
  • Pairs are ordered by timestamp
  • Value is float64

15 of 96

What is time series?

  • A series of (timestamp, value) pairs
  • Pairs are ordered by timestamp
  • Value is float64
  • Each time series is uniquely identified by a key

16 of 96

What is time series?

  • A series of (timestamp, value) pairs
  • Pairs are ordered by timestamp
  • Value is float64
  • Each time series is uniquely identified by a key
  • A key contains non-empty set of (label=value) pairs

17 of 96

What is time series?

  • A series of (timestamp, value) pairs
  • Pairs are ordered by timestamp
  • Value is float64
  • Each time series is uniquely identified by a key
  • A key contains non-empty set of (label=value) pairs
  • Example:

{__name__=”cpu_usage”, instance=”my-server”, datacenter=”us-east”}

(t1, 10), (t2, 20), (t3, 12), …. (tN, 15)

18 of 96

Time series applications

19 of 96

Time series applications

  • DevOps - CPU, RAM, network, rps, errors count

20 of 96

Time series applications

  • DevOps - CPU, RAM, network, rps, errors count
  • IoT - temperature, pressure, geo coordinates

21 of 96

Time series applications

  • DevOps - CPU, RAM, network, rps, errors count
  • IoT - temperature, pressure, geo coordinates
  • Finance - stock prices

22 of 96

Time series applications

  • DevOps - CPU, RAM, network, rps, errors count
  • IoT - temperature, pressure, geo coordinates
  • Finance - stock prices
  • Industrial monitoring - sensors in wind turbines, factories, robots

23 of 96

Time series applications

  • DevOps - CPU, RAM, network, rps, errors count
  • IoT - temperature, pressure, geo coordinates
  • Finance - stock prices
  • Industrial monitoring - sensors in wind turbines, factories, robots
  • Cars - engine health, tire pressure, speed, distance

24 of 96

Time series applications

  • DevOps - CPU, RAM, network, rps, errors count
  • IoT - temperature, pressure, geo coordinates
  • Finance - stock prices
  • Industrial monitoring - sensors in wind turbines, factories, robots
  • Cars - engine health, tire pressure, speed, distance
  • Aircraft and aerospace - black box, spaceship telemetry

25 of 96

Time series applications

  • DevOps - CPU, RAM, network, rps, errors count
  • IoT - temperature, pressure, geo coordinates
  • Finance - stock prices
  • Industrial monitoring - sensors in wind turbines, factories, robots
  • Cars - engine health, tire pressure, speed, distance
  • Aircraft and aerospace - black box, spaceship telemetry
  • Healthcare - cardiogram, blood pressure

26 of 96

What does time series database do?

27 of 96

What does time series database do?

  • Stores (timestamp, value) data points under the given key

28 of 96

What does time series database do?

  • Stores (timestamp, value) data points under the given key
  • Provides an API for querying time series:

29 of 96

What does time series database do?

  • Stores (timestamp, value) data points under the given key
  • Provides an API for querying time series:
    • By key

30 of 96

What does time series database do?

  • Stores (timestamp, value) data points under the given key
  • Provides an API for querying time series:
    • By key
    • By (label=value) pair

31 of 96

What does time series database do?

  • Stores (timestamp, value) data points under the given key
  • Provides an API for querying time series:
    • By key
    • By (label=value) pair
    • By a set of (label=value) pairs

32 of 96

What does time series database do?

  • Stores (timestamp, value) data points under the given key
  • Provides an API for querying time series:
    • By key
    • By (label=value) pair
    • By a set of (label=value) pairs
    • By a set of (label=<regexp>) pairs

33 of 96

What does time series database do?

  • Stores (timestamp, value) data points under the given key
  • Provides an API for querying time series:
    • By key
    • By (label=value) pair
    • By a set of (label=value) pairs
    • By a set of (label=<regexp>) pairs
    • Example: {__name__=”cpu_usage”, datacenter=~”us-.+”} would select all the cpu_usage time series for all the datacenters in US

34 of 96

What does time series database do?

  • Stores (timestamp, value) data points under the given key
  • Provides an API for querying time series:
    • By key
    • By (label=value) pair
    • By a set of (label=value) pairs
    • By a set of (label=<regexp>) pairs
    • Example: {__name__=”cpu_usage”, datacenter=~”us-.+”} would select all the cpu_usage time series for all the datacenters in US
  • Provides query language for time series data: PromQL, InfluxQL, Flux, Q

35 of 96

Time series database architecture

Inverted index

Data points

2. timeseries_id: (timestamp, value)

1. {label=value} set

insert

4. a set of timeseries_ids

3. {label=value} filters

select

36 of 96

Query life

37 of 96

Query life

  • Select all the timeseries_ids from inverted index that match the given set of (label=value or regexp) pairs

38 of 96

Query life

  • Select all the timeseries_ids from inverted index that match the given set of (label=value or regexp) pairs
  • Select all the data points for the given timeseries_ids set in the given time range

39 of 96

Query life

  • Select all the timeseries_ids from inverted index that match the given set of (label=value or regexp) pairs
  • Select all the data points for the given timeseries_ids set in the given time range
  • Perform additional processing for the selected data points

40 of 96

Inverted index

41 of 96

Inverted index

  • A (K: V) map

42 of 96

Inverted index

  • A (K: V) map
  • K is (label=value) pair

43 of 96

Inverted index

  • A (K: V) map
  • K is (label=value) pair
  • V is a set of timeseries_ids for all the time series with the given (label=value) pair

44 of 96

Inverted index

  • A (K: V) map
  • K is (label=value) pair
  • V is a set of timeseries_ids for all the time series with the given (label=value) pair
  • Quickly finds all the timeseries_ids for the given (label=value) pair

45 of 96

Inverted index

  • A (K: V) map
  • K is (label=value) pair
  • V is a set of timeseries_ids for all the time series with the given (label=value) pair
  • Quickly finds all the timeseries_ids for the given (label=value) pair
  • Quickly finds all the timeseries_ids for the given set of (label=value) pairs

46 of 96

Inverted index implementations

47 of 96

Inverted index: naive implementation

var invertedIndex = make(map[string][]int)

func getMetricIDs(labelValues []string) []int {

metricIDs := invertedIndex[labelValue[0]]

for _, labelValue := range labelValues[1:] {

newMetricIDs := invertedIndex[labelValue]

metricIDs = intersectInts(metricIDs, newMetricIDs)

}

return metricIDs

}

48 of 96

Inverted index: naive implementation issues

49 of 96

Inverted index: naive implementation issues

  • Missing persistence - data is lost on process restart

50 of 96

Inverted index: naive implementation issues

  • Missing persistence - data is lost on process restart
  • Inverted index must fit RAM - doesn’t scale to big number of time series

51 of 96

Inverted index: LevelDB

  • Store {(label=value): timeseries_id} rows in LevelDB
  • Extract all the timeseries_ids from all the rows for the given (label=value) pair

52 of 96

Inverted index: LevelDB

var invertedIndex *LevelDB

func getMetricIDs(labelValues []string) []int {

metricIDs := invertedIndex.GetValues(labelValue[0])

for _, labelValue := range labelValues[1:] {

newMetricIDs := invertedIndex.GetValues(labelValue)

metricIDs = intersectInts(metricIDs, newMetricIDs)

}

return metricIDs

}

53 of 96

Inverted index: LevelDB issues

54 of 96

Inverted index: LevelDB issues

  • Slower than the naive implementation

55 of 96

Inverted index: LevelDB issues

  • Slower than the naive implementation
  • Cgo overhead for LevelDB and RocksDB

56 of 96

Inverted index: mergeset

57 of 96

Inverted index: mergeset

  • Based on MergeTree data structure from ClickHouse

58 of 96

Inverted index: mergeset

  • Based on MergeTree data structure from ClickHouse
  • Optimized for fast inverted index lookups in VictoriaMetrics

59 of 96

Inverted index: mergeset

  • Based on MergeTree data structure from ClickHouse
  • Optimized for fast inverted index lookups in VictoriaMetrics
  • Written in pure Go

60 of 96

Inverted index: mergeset

  • Based on MergeTree data structure from ClickHouse
  • Optimized for fast inverted index lookups in VictoriaMetrics
  • Written in pure Go
  • The API is similar to LevelDB or RocksDB

61 of 96

Inverted index: production issues

  • High churn rate

62 of 96

Inverted index: production issues

  • High churn rate
  • High number of time series matching the given (label=value) pair (hundreds of millions)

63 of 96

Inverted index: production issues

  • High churn rate
  • High number of time series matching the given (label=value) pair (hundreds of millions)
  • Matching (label=<regexp>)

64 of 96

High churn rate

65 of 96

High churn rate

  • Certain labels are constantly changed

66 of 96

High churn rate

  • Certain labels are constantly changed
  • For instance, {deployment_id=”<deployment_id>”} in Kubernetes

67 of 96

High churn rate

  • Certain labels are constantly changed
  • For instance, {deployment_id=”<deployment_id>”} in Kubernetes
  • Each deployment starts new set of time series

68 of 96

High churn rate

  • Certain labels are constantly changed
  • For instance, {deployment_id=”<deployment_id>”} in Kubernetes
  • Each deployment starts new set of time series
  • The number of {datacenter=”us-east”} time series grows over time

69 of 96

High churn rate

  • Certain labels are constantly changed
  • For instance, {deployment_id=”<deployment_id>”} in Kubernetes
  • Each deployment starts new set of time series
  • The number of {datacenter=”us-east”} time series grows over time
  • But the number of {datacenter=”us-east”} time series for the last day remains constant

70 of 96

High churn rate

  • Certain labels are constantly changed
  • For instance, {deployment_id=”<deployment_id>”} in Kubernetes
  • Each deployment starts new set of time series
  • The number of {datacenter=”us-east”} time series grows over time
  • But the number of {datacenter=”us-east”} time series for the last day remains constant
  • How to provide constant speed for selecting {datacenter=”us-east”} time series for the last day?

71 of 96

High churn rate solutions

72 of 96

Partition inverted index by time

  • Pros:
    • Simple
    • Fast
  • Cons:
    • duplicates inverted index data for long-living time series

Partition 1

Partition 2

Partition 3

Partition N

...

time

73 of 96

Per-day timeseries_ids sets for active time series

  • Pros:
    • Index for long-living time series is stored only once
  • Cons:
    • harder to implement
    • needs to scan bigger timeseries_ids sets

time

Day 1 timeseries_ids

Day 2 timeseries_ids

Day N timeseries_ids

...

74 of 96

Solutions for high number of time series matching (label=value)

75 of 96

Store multiple timeseries_ids per mergeset row

  • Pros:
    • requires less memory (especially for long (label=value) pairs)
    • improves scan speed
  • Cons:
    • hard to implement
    • hard to debug

Original rows: (label=value) timeseries_id1

(label=value) timeseries_id2

(label=value) timeseries_idN

Optimized row: (label=value) timeseries_id1, timeseries_id2, … timeseries_idN

76 of 96

Shard inverted index by time series key (a set of (label=value) pairs)

  • Pros:
    • scales to multiple CPUs
  • Cons:
    • requires more CPU resources

Shard 1

CPU 1

Shard 2

CPU 2

Shard N

CPU N

Merge timeseries_ids

...

77 of 96

Optimizations for timeseries_ids sets intersection

  • {datacenter=”us-east”, job=”my-app”, instance=”my-host”} requires intersection of three timeseries_ids sets:
    • (datacenter=”us-east”)
    • (job=”my-app”)
    • (instance=”my-host”)
  • How to quickly intersect timeseries_ids sets?

78 of 96

Timeseries_ids sets intersection: naive approach

// intersectInts returns the intersection of a and b sets

func interesectInts(a, b []int) []int {

var result []int

for _, x := range a {

for _, y := range b {

if x == y {

result = append(result, x)

break

}

}

}

return result

}

79 of 96

What’s wrong with the naive approach?

80 of 96

What’s wrong with the naive approach?

  • It works slowly with big sets
  • It has O(N^2) complexity
  • Let len(a) == 1M, len(b) == 1M
  • Then the max number of iterations equals to len(a)*len(b)=1M*1M=1 trillion

81 of 96

Timeseries_ids intersection: map

// intersectInts returns the intersection of a and b sets

func interesectInts(a, b []int) []int {

m := make(map[int]bool)

for _, x := range a {

m[x] = true

}

var result []int

for _, y := range b {

if m[y] {

result = append(result, y)

}

}

return result

}

82 of 96

Set intersection with map

  • Pros:
    • Has O(N) complexity
  • Cons:
    • Has high overhead on hashing

83 of 96

Timeseries_ids intersection: customized bitset

// intersectInts returns the intersection of a and b sets

func interesectInts(a, b []uint64) []uint64 {

s := &uint64set.Set{}

for _, x := range a {

s.Add(x)

}

var result []uint64

for _, y := range b {

if s.Has(y) {

result = append(result, y)

}

}

return result

}

84 of 96

Set intersection with customized bitset

  • Pros:
    • Up to 10x better performance comparing to map-based intersection
    • Lower memory usage for big sets (>1M items)
  • Cons:
    • Non-trivial implementation
    • Memory usage can explode if improperly used

85 of 96

Customized bitset: implementation details

  • Located at lib/uint64set
  • Optimized for dense serial timeseries_ids where higher 32 bits are constant
  • Doesn’t provide data persistence

86 of 96

lib/uint64set API

package uint64set // import "github.com/VictoriaMetrics/VictoriaMetrics/lib/uint64set"

type Set struct {

// Has unexported fields.

}

Set is a fast set for uint64.

It should work faster than map[uint64]struct{} for semi-sparse uint64 values

such as MetricIDs generated by lib/storage.

It is unsafe calling Set methods from concurrent goroutines.

func (s *Set) Add(x uint64)

func (s *Set) AppendTo(dst []uint64) []uint64

func (s *Set) Clone() *Set

func (s *Set) Del(x uint64)

func (s *Set) Has(x uint64) bool

func (s *Set) Len() int

87 of 96

lib/uint64set internals

type Set struct {

itemsCount int

buckets []*bucket32

}

type bucket32 struct {

hi uint32

b16his []uint16

buckets []*bucket16

}

type bucket16 struct {

bits [wordsPerBucket]uint64

}

const (

bitsPerBucket = 1 << 16

wordsPerBucket = bitsPerBucket / 64

)

88 of 96

lib/uint64set internals: Set.Add

// Add adds x to s.

func (s *Set) Add(x uint64) {

hi := uint32(x >> 32)

lo := uint32(x)

for _, b32 := range s.buckets {

if b32.hi == hi {

if b32.add(lo) {

s.itemsCount++

}

return

}

}

s.addAlloc(hi, lo)

}

89 of 96

lib/uint64set internals: bucket32.add

func (b *bucket32) add(x uint32) bool {

hi := uint16(x >> 16)

lo := uint16(x)

if len(b.buckets) > maxUnsortedBuckets {

return b.addSlow(hi, lo)

}

for i, hi16 := range b.b16his {

if hi16 == hi {

return b.buckets[i].add(lo)

}

}

b.addAllocSmall(hi, lo)

return true

}

90 of 96

lib/uint64set internals: bucket16.add

func (b *bucket16) add(x uint16) bool {

wordNum, bitMask := getWordNumBitMask(x)

word := &b.bits[wordNum]

ok := *word&bitMask == 0

*word |= bitMask

return ok

}

91 of 96

More optimizations

92 of 96

More optimizations

  • We covered small subset of VictoriaMetrics optimizations

93 of 96

More optimizations

  • We covered small subset of VictoriaMetrics optimizations
  • There are many more optimizations in the code

94 of 96

More optimizations

  • We covered small subset of VictoriaMetrics optimizations
  • There are many more optimizations in the code
  • The majority of these optimizations are applied after `go tool pprof` analysis

95 of 96

More optimizations

  • We covered small subset of VictoriaMetrics optimizations
  • There are many more optimizations in the code
  • The majority of these optimizations are applied after `go tool pprof` analysis
  • Investigate VictoriaMetrics Go code - it is free and open source: https://github.com/VictoriaMetrics/VictoriaMetrics

96 of 96

Questions?

Aliaksandr Valialkin, CTO at VictoriaMetrics