Go optimizations in VictoriaMetrics
Aliaksandr Valialkin, CTO at VictoriaMetrics
About me
Agenda
What is VictoriaMetrics?
What is VictoriaMetrics?
What is VictoriaMetrics?
What is VictoriaMetrics?
What is VictoriaMetrics?
What is VictoriaMetrics?
What is VictoriaMetrics?
What is time series?
What is time series?
What is time series?
What is time series?
What is time series?
What is time series?
What is time series?
{__name__=”cpu_usage”, instance=”my-server”, datacenter=”us-east”}
(t1, 10), (t2, 20), (t3, 12), …. (tN, 15)
Time series applications
Time series applications
Time series applications
Time series applications
Time series applications
Time series applications
Time series applications
Time series applications
What does time series database do?
What does time series database do?
What does time series database do?
What does time series database do?
What does time series database do?
What does time series database do?
What does time series database do?
What does time series database do?
What does time series database do?
Time series database architecture
Inverted index
Data points
2. timeseries_id: (timestamp, value)
1. {label=value} set
insert
4. a set of timeseries_ids
3. {label=value} filters
select
Query life
Query life
Query life
Query life
Inverted index
Inverted index
Inverted index
Inverted index
Inverted index
Inverted index
Inverted index implementations
Inverted index: naive implementation
var invertedIndex = make(map[string][]int)
func getMetricIDs(labelValues []string) []int {
metricIDs := invertedIndex[labelValue[0]]
for _, labelValue := range labelValues[1:] {
newMetricIDs := invertedIndex[labelValue]
metricIDs = intersectInts(metricIDs, newMetricIDs)
}
return metricIDs
}
Inverted index: naive implementation issues
Inverted index: naive implementation issues
Inverted index: naive implementation issues
Inverted index: LevelDB
Inverted index: LevelDB
var invertedIndex *LevelDB
func getMetricIDs(labelValues []string) []int {
metricIDs := invertedIndex.GetValues(labelValue[0])
for _, labelValue := range labelValues[1:] {
newMetricIDs := invertedIndex.GetValues(labelValue)
metricIDs = intersectInts(metricIDs, newMetricIDs)
}
return metricIDs
}
Inverted index: LevelDB issues
Inverted index: LevelDB issues
Inverted index: LevelDB issues
Inverted index: mergeset
Inverted index: mergeset
Inverted index: mergeset
Inverted index: mergeset
Inverted index: mergeset
Inverted index: production issues
Inverted index: production issues
Inverted index: production issues
High churn rate
High churn rate
High churn rate
High churn rate
High churn rate
High churn rate
High churn rate
High churn rate solutions
Partition inverted index by time
Partition 1
Partition 2
Partition 3
Partition N
...
time
Per-day timeseries_ids sets for active time series
time
Day 1 timeseries_ids
Day 2 timeseries_ids
Day N timeseries_ids
...
Solutions for high number of time series matching (label=value)
Store multiple timeseries_ids per mergeset row
Original rows: (label=value) timeseries_id1
(label=value) timeseries_id2
…
(label=value) timeseries_idN
Optimized row: (label=value) timeseries_id1, timeseries_id2, … timeseries_idN
Shard inverted index by time series key (a set of (label=value) pairs)
Shard 1
CPU 1
Shard 2
CPU 2
Shard N
CPU N
Merge timeseries_ids
...
Optimizations for timeseries_ids sets intersection
Timeseries_ids sets intersection: naive approach
// intersectInts returns the intersection of a and b sets
func interesectInts(a, b []int) []int {
var result []int
for _, x := range a {
for _, y := range b {
if x == y {
result = append(result, x)
break
}
}
}
return result
}
What’s wrong with the naive approach?
What’s wrong with the naive approach?
Timeseries_ids intersection: map
// intersectInts returns the intersection of a and b sets
func interesectInts(a, b []int) []int {
m := make(map[int]bool)
for _, x := range a {
m[x] = true
}
var result []int
for _, y := range b {
if m[y] {
result = append(result, y)
}
}
return result
}
Set intersection with map
Timeseries_ids intersection: customized bitset
// intersectInts returns the intersection of a and b sets
func interesectInts(a, b []uint64) []uint64 {
s := &uint64set.Set{}
for _, x := range a {
s.Add(x)
}
var result []uint64
for _, y := range b {
if s.Has(y) {
result = append(result, y)
}
}
return result
}
Set intersection with customized bitset
Customized bitset: implementation details
lib/uint64set API
package uint64set // import "github.com/VictoriaMetrics/VictoriaMetrics/lib/uint64set"
type Set struct {
// Has unexported fields.
}
Set is a fast set for uint64.
It should work faster than map[uint64]struct{} for semi-sparse uint64 values
such as MetricIDs generated by lib/storage.
It is unsafe calling Set methods from concurrent goroutines.
func (s *Set) Add(x uint64)
func (s *Set) AppendTo(dst []uint64) []uint64
func (s *Set) Clone() *Set
func (s *Set) Del(x uint64)
func (s *Set) Has(x uint64) bool
func (s *Set) Len() int
lib/uint64set internals
type Set struct {
itemsCount int
buckets []*bucket32
}
type bucket32 struct {
hi uint32
b16his []uint16
buckets []*bucket16
}
type bucket16 struct {
bits [wordsPerBucket]uint64
}
const (
bitsPerBucket = 1 << 16
wordsPerBucket = bitsPerBucket / 64
)
lib/uint64set internals: Set.Add
// Add adds x to s.
func (s *Set) Add(x uint64) {
hi := uint32(x >> 32)
lo := uint32(x)
for _, b32 := range s.buckets {
if b32.hi == hi {
if b32.add(lo) {
s.itemsCount++
}
return
}
}
s.addAlloc(hi, lo)
}
lib/uint64set internals: bucket32.add
func (b *bucket32) add(x uint32) bool {
hi := uint16(x >> 16)
lo := uint16(x)
if len(b.buckets) > maxUnsortedBuckets {
return b.addSlow(hi, lo)
}
for i, hi16 := range b.b16his {
if hi16 == hi {
return b.buckets[i].add(lo)
}
}
b.addAllocSmall(hi, lo)
return true
}
lib/uint64set internals: bucket16.add
func (b *bucket16) add(x uint16) bool {
wordNum, bitMask := getWordNumBitMask(x)
word := &b.bits[wordNum]
ok := *word&bitMask == 0
*word |= bitMask
return ok
}
More optimizations
More optimizations
More optimizations
More optimizations
More optimizations
Questions?
Aliaksandr Valialkin, CTO at VictoriaMetrics