Inside the Map Implementation
Keith Randall
@GopherCon, 2016/07/12
Can’t cover it all in ½ hour!
What are maps?
Associative containers mapping keys to values.
Key type must have an == operation (no maps, slices, or funcs as keys).
Operations run in constant expected time.
Use Go
maps
Use Go
maps
Stock Trading
High-frequency trading bot
var nasdaq chan struct {
symbol string
price float64
}
m := map[string]float64{}
for x := range nasdaq {
last := m[x.symbol]
if x.price > last {
fmt.Printf(“buy %s!\n”, x.symbol)
}
if x.price < last {
fmt.Printf(“sell %s!\n”, x.symbol)
}
m[x.symbol] = x.price
}
NASDAQ stock prices
MSFT 50.81�AAPL 94.56�GOOG 706.63�CSCO 26.72�ORCL 39.47�INTC 29.99�VOD 34.13�QCOM 52.79�AMZN 697.45�AMGN 150.83�…
High-frequency trading bot
var nasdaq chan struct {
symbol string
price float64
}
m := map[string]float64{}
for x := range nasdaq {
last := m[x.symbol]
if x.price > last {
fmt.Printf(“buy %s!\n”, x.symbol)
}
if x.price < last {
fmt.Printf(“sell %s!\n”, x.symbol)
}
m[x.symbol] = x.price
}
NASDAQ stock prices
MSFT 50.81�AAPL 94.56�GOOG 706.63�CSCO 26.72�ORCL 39.47�INTC 29.99�VOD 34.13�QCOM 52.79�AMZN 697.45�AMGN 150.83�…
High-frequency trading bot
var nasdaq chan struct {
symbol string
price float64
}
m := map[string]float64{}
for x := range nasdaq {
last := m[x.symbol]
if x.price > last {
fmt.Printf(“buy %s!\n”, x.symbol)
}
if x.price < last {
fmt.Printf(“sell %s!\n”, x.symbol)
}
m[x.symbol] = x.price
}
NASDAQ stock prices
MSFT 50.81�AAPL 94.56�GOOG 706.63�CSCO 26.72�ORCL 39.47�INTC 29.99�VOD 34.13�QCOM 52.79�AMZN 697.45�AMGN 150.83�…
High-frequency trading bot
var nasdaq chan struct {
symbol string
price float64
}
m := map[string]float64{}
for x := range nasdaq {
last := m[x.symbol]
if x.price > last {
fmt.Printf(“buy %s!\n”, x.symbol)
}
if x.price < last {
fmt.Printf(“sell %s!\n”, x.symbol)
}
m[x.symbol] = x.price
}
NASDAQ stock prices
MSFT 50.81�AAPL 94.56�GOOG 706.63�CSCO 26.72�ORCL 39.47�INTC 29.99�VOD 34.13�QCOM 52.79�AMZN 697.45�AMGN 150.83�…
High-frequency trading bot
var nasdaq chan struct {
symbol string
price float64
}
m := map[string]float64{}
for x := range nasdaq {
last := m[x.symbol]
if x.price > last {
fmt.Printf(“buy %s!\n”, x.symbol)
}
if x.price < last {
fmt.Printf(“sell %s!\n”, x.symbol)
}
m[x.symbol] = x.price
}
NASDAQ stock prices
MSFT 50.81�AAPL 94.56�GOOG 706.63�CSCO 26.72�ORCL 39.47�INTC 29.99�VOD 34.13�QCOM 52.79�AMZN 697.45�AMGN 150.83�…
Simple implementation
type entry struct {
k string
v float64�}
type map []entry
func lookup(m map, k string) float64 {
for _, e := range m {
if e.k == k {
return e.v
}
}
return 0
}
Simple implementation
type entry struct {
k string
v float64�}
type map []entry
func lookup(m map, k string) float64 {
for _, e := range m {
if e.k == k {
return e.v
}
}
return 0
}
Speed Fail!
Idea
Idea: split data up into buckets!
AAPL 94.56�CSCO 26.72�AMZN 697.45�AMGN 150.83�
MSFT 50.81�GOOG 706.63�INTC 29.99�
ORCL 39.47�QCOM 52.79�
VOD 34.13�
bucket = (symbol[0]-’A’)*4/(‘Z’-’A’)
A-F
G-M
N-S
T-Z
Idea: split data up into buckets!
AAPL 94.56�CSCO 26.72�AMZN 697.45�AMGN 150.83�
MSFT 50.81�GOOG 706.63�INTC 29.99�
ORCL 39.47�QCOM 52.79�
VOD 34.13�
bucket = (symbol[0]-’A’)*4/(‘Z’-’A’)
A-F
G-M
N-S
T-Z
Speed Fixed!
URLDAQ
http://google.com�http://intel.com
http://cisco.com
http://apple.com
http://amazon.com
http://amgen.com
http://microsoft.com
http://oracle.com
http://qualcomm.com
http://vodafone.com
A-F
G-M
N-S
T-Z
URLDAQ
http://google.com�http://intel.com
http://cisco.com
http://apple.com
http://amazon.com
http://amgen.com
http://microsoft.com
http://oracle.com
http://qualcomm.com
http://vodafone.com
A-F
G-M
N-S
T-Z
Bucketing by first letter isn’t very good.
Is there a way to reliably split the data into buckets?
Speed Fail!
Hash function
Choose a bucket for each key so that entries are distributed as evenly as possible.
bucket = h(key)
Required for correctness:
Want for performance:
Good news: Such hash functions exist.
Bad news: Hard to get right.
Go gets it right for you. → no user-defined hash functions.
Map bucket in Go
ei = 8-bit slots for extra hash bits
e0 e1 e2 e3 e4 e5 e6 e7
MSFT 50.81�CSCO 26.72�ORCL 39.47�INTC 29.99�QCOM 52.79�AMZN 697.45�--- ---
--- ---
overflow
overflow points to another bucket if we need more than 8 slots.
8 slots for data
Map in Go
m
len
lg(#buckets)
bucket array
hash seed
...
bucket 0
bucket 1
bucket 2
bucket 3
map header
bucket 2
overflow
bucket array
var m map[string]float64
Lookup
v = m[k]
v = runtime.lookup(m, k)
compiles to
Lookup
v = m[k]
v = runtime.lookup(m, k)
compiles to
where the runtime has the function
func<K,V> lookup(m map[K]V, k K) V
Lookup
v = m[k]
v = runtime.lookup(m, k)
compiles to
where the runtime has the function
func<K,V> lookup(m map[K]V, k K) V
Generics Fail!
Faking generics
type _type struct {
size uintptr
equal func(unsafe.Pointer, unsafe.Pointer) bool
hash func(unsafe.Pointer, uintptr) uintptr
...
}
type mapType struct {
key *_type
value *_type
...
}
Lookup
v = m[k]
compiles to
where the runtime has the function
func lookup(t *mapType,
m *mapHeader,
k unsafe.Pointer) unsafe.Pointer
pk := unsafe.Pointer(&k)
pv := runtime.lookup(typeOf(m), m, pk)
v = *(*V)pv
Lookup implementation, page 1 of 2
// lookup looks up a key in a map and returns a pointer to the associated value.�// t = type of the map�// m = map�// key = pointer to key�func lookup(t *mapType, m *mapHeader, key unsafe.Pointer) unsafe.Pointer {� if m == nil || m.count == 0 {� return zero� }� � hash := t.key.hash(key, m.seed) // hash := hashfn(key)� bucket := hash & (1<<m.logB-1) // bucket := hash % nbuckets� extra := byte(hash >> 56) // extra := top 8 bits of hash� b := (*bucket)(add(m.buckets, bucket*t.bucketsize)) // b := &m.buckets[bucket]�
Lookup implementation, page 2 of 2
for {� for i := 0; i < 8; i++ {� if b.extra[i] != extra { // check 8 extra hash bits� continue� }� k := add(b, dataOffset+i*t.key.size) // pointer to ki in bucket� if t.key.equal(key, k) {� // return pointer to vi� return add(b, dataOffset+8*t.key.size+i*t.value.size)� }� }� b = b.overflow� if b == nil {� return zero� }� }�}�
Growing the map
When buckets get too full, we need to grow the map.
“too full” = average of 6.5 entries per bucket
The process of copying is done incrementally, a little bit during each insert or delete. During copying, operations on the map are a bit more expensive.
Maps in other languages
| c++ | Java | Python | Go |
&m[k] allowed | yes | no | no | no |
Modify during iteration | no | no | no | yes |
Provide your own ==, hash | yes | yes | yes | no |
Adversary safe | no | yes* | no+ | yes |
Speed
Space
map[int64]int64
16 bytes/entry
Conclusions
Use maps!
Conclusions
Use maps!
Conclusions
Use maps!
Conclusions
Use maps!
Conclusions
Use maps!
Properties
Iterators
Unlike most languages, Go provides definite semantics of iterators in the presence of inserts and deletes.
Still to do
Insider tips