1 of 57

Faster Apps,

No Memory Thrash

Get Your Ruby Memory Config Right

Noah Gibbs, AppFolio Inc - @codefolio

2 of 57

I’m a Ruby Fellow for Appfolio.

They pay me to do this stuff, and to write at engineering.appfolio.com.

Thank you, Appfolio!

3 of 57

4 of 57

Randomly: Have You Seen the Sendai City Museum? The handmade maps are gorgeous.

きれいですね?

5 of 57

Okay, Now Some Ruby.

6 of 57

Big, Small and Tiny Objects

7 of 57

Tiny Objects

The smallest Ruby objects exist inside their references. No extra allocations.

8 of 57

Small Objects

Small objects live within a 40-byte Slot. 408 Slots are allocated per Page.

9 of 57

Big Objects

Objects that don’t fit in a Slot get allocated one by one. They each also keep a Slot.

10 of 57

Garbage Collection (GC)

11 of 57

GC

Ruby has a generational Mark/Sweep GC. Very good, not best-in-the-world.

12 of 57

Major, Minor

A Major GC checks all objects, while Minor GC checks only newer (“young”) objects.

13 of 57

Manual GC

You can start minor or major GC with an API call:

GC.start # major

GC.start(full_mark: false) # minor

14 of 57

Grow, Collect, then Expand:

The GC Cycle

15 of 57

Phase 1: Grow

Your program creates objects. The objects use Slots and heap memory.

16 of 57

Phase 2: Collect

When you need more memory or have allocated a certain number of bytes, GC starts.

17 of 57

Phase 3: Expand

If there still aren’t enough Slots, allocate more. If we crossed a threshold, raise it.

18 of 57

Over Time

A long-running Ruby app expands to its “natural” size asymptotically.

19 of 57

The Best Way Isn’t Easy

20 of 57

The Problem

Creating more garbage objects makes GC slower, and makes expansion slower.

21 of 57

Best: Cut Waste

“Do nothing” is very fast. More efficient algorithms are good too.

22 of 57

Precalculate

Can you save results somehow? Caching is the next-fastest after not doing anything at all.

23 of 57

Fewer, Bigger

It’s often better to use a single bigger object than several smaller ones.

24 of 57

Destructive Ops

Destructive operations like gsub! and concat can save CPU and memory.

25 of 57

What Ruby Tells You About Memory: GC.stat

26 of 57

Data Firehose

GC.stat has everything. And changes a bit between Ruby versions.

27 of 57

Useful Parts

  • :heap_available_slots,�:heap_live_slots, :heap_free_slots
  • :major_gc_count, :minor_gc_count

28 of 57

What You Tell Ruby: Environment Variables

29 of 57

Environment Variables

  • RUBY_GC_HEAP_INIT_SLOTS
  • RUBY_GC_HEAP_FREE_SLOTS_GOAL_RATIO
  • RUBY_GC_MALLOC_LIMIT
  • RUBY_GC_OLDMALLOC_LIMIT

30 of 57

Env Vars: Why?

By setting the initial size (malloc limit, Slots, etc) you can speed up startup.

31 of 57

EnvMem: A Memory Tool

32 of 57

Expansion

Your process expands in phases. Startup can be a bit slow. Can we fix it?

33 of 57

Fast Start

We can start the new process “after” those expansions. Env vars do that.

34 of 57

Cycle Back

Which settings to use? EnvMem gets them from your process’s GC.stat.

35 of 57

Installing EnvMem

gem install env_mem

# In Gemfile

gem “env_mem”

36 of 57

Running EnvMem

# In your app

File.open(“my_file”, “w”) { |f|

f.print GC.stat.inspect }

# Afterward

env_mem my_file > env_script.sh

37 of 57

Does It Work?

Last year, I showed that CRuby has about 5%-7% warmup time for Rails Ruby Bench.

38 of 57

Does It Work?

With EnvMem there’s no measurable warmup. Just noise.

39 of 57

A Quick Win

40 of 57

Easy Answers

This is hard. Is there anything very simple and faster?

41 of 57

Easy Answers

This is hard. Is there anything very simple and faster?

Yes.

42 of 57

Allocators

Your OS comes with a memory allocator: malloc. But there are others.

43 of 57

jemalloc

Especially on Linux, you’re better off with jemalloc. Ruby already supports it.

44 of 57

Build with jemalloc

# Configure the Ruby source directly

./configure --with-jemalloc

# Or use rvm instead

rvm install 2.5.0 -C --with-jemalloc

# Or rbenv / ruby-build

RUBY_CONFIGURE_OPTS="--with-jemalloc" rbenv install 2.5.0

45 of 57

How Much Faster?

46 of 57

How Much Faster?

I measure about 10%-12% end-to-end total speedup. Some of that is probably better caching because jemalloc uses less memory.

47 of 57

Solutions and Speedups

48 of 57

What Helps?

  • LESS WASTE
  • Good env variables
  • jemalloc
  • Latest Ruby

49 of 57

Advanced Methods

For Debugging

50 of 57

Slot Fragmentation

51 of 57

Full Slots, Stuck

When a Slot is used, it can’t move again until the whole page of 408 Slots is freed.

52 of 57

Check Fragmentation

s = GC.stat

used_ratio = s[:heap_live_slots].to_f / (s[:heap_eden_pages] * 408)

fragmentation = 1.0 - used_ratio

53 of 57

What Ruby Tells You: GC::Profiler

54 of 57

GC Profile Mode

Want more detail about specific garbage collections as they happen? GC::Profiler.enable

55 of 57

GC Profiling

GC::Profiler.enable

# run code that creates garbage (A)

puts GC::Profiler.report

# prints a table of GCs that happened

# during the code in (A)

56 of 57

Source Code:

Noah Gibbs, AppFolio - Tw: @codefolio

These slides: http://bit.ly/kaigi2018-gibbs

57 of 57

Questions?

Noah Gibbs, AppFolio - Tw: @codefolio

These slides: http://bit.ly/kaigi2018-gibbs