1 of 93

Performance Bugs and

Low-Level Ruby Observability APIs

Ivo Anjo / ivo.anjo@datadoghq.com

slides dtdg.co/rubykaigi2025 👀!!!

slides dtdg.co/rubykaigi2025

2 of 93

Good morning

2

2

松山

slides dtdg.co/rubykaigi2025

3 of 93

Who am I

3

3

Ivo Anjo

Senior Software Engineer @ Datadog

I work on the�Datadog Ruby Continuous Profiler! 🎉�� Making tools to uncover perf insights 👀

slides dtdg.co/rubykaigi2025

4 of 93

Modern monitoring and security as a Service 🐕

Including profiling, distributed tracing, test optimization, security, errors, logs for Ruby

?

slides dtdg.co/rubykaigi2025

5 of 93

Low-Level Ruby Observability APIs?

Low-Level

🔩 Close to the CRuby VM implementation, to details you should usually ignore!

😅 JRuby and TruffleRuby do offer some amazing tools as well, that I won’t be covering today

Observability APIs

🔍 X-Ray what and how exactly your application runs

5

5

slides dtdg.co/rubykaigi2025

6 of 93

But …why?

Observability x “How would you know?” problem

✅ If your tests are passing…

✅ …and the app returns the right result

⏱ ...but takes > 5 seconds to return it…

6

6

slides dtdg.co/rubykaigi2025

7 of 93

How would you know?

  • If you have an n+1 query problem
  • If this service is slow because another service is slow
  • If there’s an unrelated background “noisy neighbour” thread hogging CPU and the Global VM Lock (GVL)
  • If the Garbage Collector is working really hard

7

7

slides dtdg.co/rubykaigi2025

8 of 93

You can use Ruby’s Observability APIs to understand what’s going on!

…Or possibly lots of puts as well, shout out to Aaron 😉 https://tenderlovemaking.com/2016/02/05/i-am-a-puts-debuggerer/

8

8

slides dtdg.co/rubykaigi2025

9 of 93

Observability also helps with the problem

✅ If your tests are passing…

✅ …and the app returns the right result… in staging

❌ ...but breaks in production

9

9

slides dtdg.co/rubykaigi2025

10 of 93

Related: “Ruby is Slow” problem

Is Ruby really slow?

Or is the service…

🐢Doing N+1 queries

⏳Waiting on a too small connection pool

🧵Using too many threads

…etc?

10

10

slides dtdg.co/rubykaigi2025

11 of 93

Existing tools?

🔧 All of these tools are based on the APIs we’re discussing

🤔 Different combinations create very different tools!

11

11

slides dtdg.co/rubykaigi2025

12 of 93

Existing tools?

✨ I want to see new and interesting combinations!

💡 What can be added and improved?

12

12

slides dtdg.co/rubykaigi2025

13 of 93

Let’s look at what Ruby offers and what problems we can investigate with it 👀

13

13

slides dtdg.co/rubykaigi2025

14 of 93

Where the magic happens 🧙

include/ruby/debug.h:

  • TracePoint APIs
  • Postponed Job API
  • Frame-profiling APIs
  • Debug inspector APIs

include/ruby/thread.h:

  • GVL instrumentation API
  • Thread specific storage APIs

14

14

slides dtdg.co/rubykaigi2025

15 of 93

Yes… Accessing these APIs means writing some C code 😅

github.com/ivoanjo/lowlevel-toolkit

lowlevel-toolkit is a new gem I’ll be using to show how to use these Ruby APIs

15

15

slides dtdg.co/rubykaigi2025

16 of 93

TracePoint APIs

slides dtdg.co/rubykaigi2025

17 of 93

Wait… 🤔

Doesn’t Ruby already�provide the TracePoint�class? 👀

17

17

slides dtdg.co/rubykaigi2025

18 of 93

Yes! ✅

You can use TracePoint with all these events from Ruby code

but…

18

18

line: Execute an expression or statement on a new line

class: Start a class or module definition

end: Finish a class or module definition

call: Call a Ruby method

return: Return from a Ruby method

c_call: Call a C-language routine

c_return: Return from a C-language routine

raise: Raise an exception

rescue: Rescue an exception

b_call: Event hook at block entry

b_return: Event hook at block ending

a_call: Event hook at all calls

a_return: Event hook at all returns

thread_begin: Event hook at thread beginning

thread_end: Event hook at thread ending

fiber_switch: Event hook at fiber switch

script_compiled: New Ruby code compiled

slides dtdg.co/rubykaigi2025

19 of 93

Internal events are only available from C code!

(C code can still use all other events as well)

19

19

#define RUBY_INTERNAL_EVENT_SWITCH 0x040000 /**< Thread switched. */

#define RUBY_INTERNAL_EVENT_NEWOBJ 0x100000 /**< Object allocated. */

#define RUBY_INTERNAL_EVENT_FREEOBJ 0x200000 /**< Object swept. */

#define RUBY_INTERNAL_EVENT_GC_START 0x400000 /**< GC started. */

#define RUBY_INTERNAL_EVENT_GC_END_MARK 0x800000 /**< GC ended mark phase. */

#define RUBY_INTERNAL_EVENT_GC_END_SWEEP 0x1000000 /**< GC ended sweep phase. */

#define RUBY_INTERNAL_EVENT_GC_ENTER 0x2000000 /**< `gc_enter()` is called. */

#define RUBY_INTERNAL_EVENT_GC_EXIT 0x4000000 /**< `gc_exit()` is called. */

slides dtdg.co/rubykaigi2025

20 of 93

Internal events are separate:

They are emitted at special situations

when it’s not possible to access many Ruby APIs

20

20

#define RUBY_INTERNAL_EVENT_SWITCH 0x040000 /**< Thread switched. */

#define RUBY_INTERNAL_EVENT_NEWOBJ 0x100000 /**< Object allocated. */

#define RUBY_INTERNAL_EVENT_FREEOBJ 0x200000 /**< Object swept. */

#define RUBY_INTERNAL_EVENT_GC_START 0x400000 /**< GC started. */

#define RUBY_INTERNAL_EVENT_GC_END_MARK 0x800000 /**< GC ended mark phase. */

#define RUBY_INTERNAL_EVENT_GC_END_SWEEP 0x1000000 /**< GC ended sweep phase. */

#define RUBY_INTERNAL_EVENT_GC_ENTER 0x2000000 /**< `gc_enter()` is called. */

#define RUBY_INTERNAL_EVENT_GC_EXIT 0x4000000 /**< `gc_exit()` is called. */

slides dtdg.co/rubykaigi2025

21 of 93

Let’s look at a few ;)

21

21

#define RUBY_INTERNAL_EVENT_SWITCH 0x040000 /**< Thread switched. */

#define RUBY_INTERNAL_EVENT_NEWOBJ 0x100000 /**< Object allocated. */

#define RUBY_INTERNAL_EVENT_FREEOBJ 0x200000 /**< Object swept. */

#define RUBY_INTERNAL_EVENT_GC_START 0x400000 /**< GC started. */

#define RUBY_INTERNAL_EVENT_GC_END_MARK 0x800000 /**< GC ended mark phase. */

#define RUBY_INTERNAL_EVENT_GC_END_SWEEP 0x1000000 /**< GC ended sweep phase. */

#define RUBY_INTERNAL_EVENT_GC_ENTER 0x2000000 /**< `gc_enter()` is called. */

#define RUBY_INTERNAL_EVENT_GC_EXIT 0x4000000 /**< `gc_exit()` is called. */

slides dtdg.co/rubykaigi2025

22 of 93

Investigating object allocation: NEWOBJ event

Are my app’s web requests delayed by GC?

22

22

slides dtdg.co/rubykaigi2025

23 of 93

Investigating object allocation: NEWOBJ event

Useful for optimization:

🖥️ Creating many objects is expensive (CPU use)

♻️ Makes the Garbage Collector work very often

23

23

slides dtdg.co/rubykaigi2025

24 of 93

Investigating object allocation: NEWOBJ event

24

24

pp(LowlevelToolkit.track_objects_created do

Object.new

Time.utc(2025, 4, 17, 1, 0, 0)

"Hello, world!"

end)

# => [#<Object:0x000078e011750cc8>, 2025-04-17 1:00:00 UTC,� "Hello, world!"]

slides dtdg.co/rubykaigi2025

25 of 93

Investigating object allocation: NEWOBJ event

25

25

VALUE track_objects_created(VALUE _) {� VALUE result = rb_ary_new();� VALUE tp = rb_tracepoint_new(0, RUBY_INTERNAL_EVENT_NEWOBJ, on_newobj_event, (void *) result);� rb_tracepoint_enable(tp); rb_yield(Qnil); rb_tracepoint_disable(tp);� return filter_hidden_objects(result);�}

void on_newobj_event(VALUE tpval, void *data) {� VALUE obj = rb_tracearg_object(rb_tracearg_from_tracepoint(tpval));� if (!rb_objspace_internal_object_p(obj)) rb_ary_push((VALUE) data, obj);�}

slides dtdg.co/rubykaigi2025

26 of 93

Investigating object allocation: NEWOBJ event

26

26

VALUE track_objects_created(VALUE _) {� VALUE result = rb_ary_new();� VALUE tp = rb_tracepoint_new(0, RUBY_INTERNAL_EVENT_NEWOBJ, on_newobj_event, (void *) result);� rb_tracepoint_enable(tp); rb_yield(Qnil); rb_tracepoint_disable(tp);� return filter_hidden_objects(result);�}

VALUE filter_hidden_objects(VALUE result) {� for (int i = 0; i < RARRAY_LEN(result); i++)� if (!RBASIC_CLASS(rb_ary_entry(result, i))) rb_ary_store(result, i, Qnil);� return result;�}

slides dtdg.co/rubykaigi2025

27 of 93

Investigating object allocation: NEWOBJ event

27

27

pp(LowlevelToolkit.track_objects_created do

Object.new

Time.utc(2025, 4, 17, 1, 0, 0)

"Hello, world!"

end)

# => [#<Object:0x000078e011750cc8>, 2025-04-17 1:00:00 UTC,� "Hello, world!"]

Objects did not get garbage collected! 😊

slides dtdg.co/rubykaigi2025

28 of 93

Investigating object allocation: NEWOBJ event

Things to keep in mind:

  • Objects are empty when NEWOBJ gets called
  • Cannot call into most Ruby APIs inside the callback�➜ Leads to very confusing crashes (Ask me how I know… 😅)
  • Be careful with internal and hidden objects!

28

28

slides dtdg.co/rubykaigi2025

29 of 93

Investigating object allocation: NEWOBJ event

🤔 Internal and hidden objects?

Ruby uses objects to keep internal details such as:

  • caches
  • method data (info, instructions, etc)
  • included modules

29

29

3.4.1 :001 > ObjectSpace.count_objects

=>

{TOTAL: 596944,

FREE: 80120,

T_OBJECT: 210201,

T_CLASS: 2507,

T_MODULE: 197,

T_FLOAT: 416,

T_STRING: 78085,

T_REGEXP: 400,

T_ARRAY: 105246,

T_HASH: 6218,

T_STRUCT: 2893,

T_BIGNUM: 2,

T_FILE: 5,

T_DATA: 60876,

T_MATCH: 11110,

T_COMPLEX: 1,

T_RATIONAL: 2,

T_SYMBOL: 388,

T_IMEMO: 38018,

T_ICLASS: 259}

slides dtdg.co/rubykaigi2025

30 of 93

Investigating object allocation: NEWOBJ event

🤔 Internal and hidden objects?

The Ruby VM hides these objects from Ruby code, but many low-level APIs actually show them! 👀

30

30

3.4.1 :001 > ObjectSpace.count_objects

=>

{TOTAL: 596944,

FREE: 80120,

T_OBJECT: 210201,

T_CLASS: 2507,

T_MODULE: 197,

T_FLOAT: 416,

T_STRING: 78085,

T_REGEXP: 400,

T_ARRAY: 105246,

T_HASH: 6218,

T_STRUCT: 2893,

T_BIGNUM: 2,

T_FILE: 5,

T_DATA: 60876,

T_MATCH: 11110,

T_COMPLEX: 1,

T_RATIONAL: 2,

T_SYMBOL: 388,

T_IMEMO: 38018,

T_ICLASS: 259}

slides dtdg.co/rubykaigi2025

31 of 93

Investigating Garbage Collector activity

When is garbage collection happening?

31

slides dtdg.co/rubykaigi2025

32 of 93

Investigating Garbage Collector activity

♻️ Useful to reduce impact of GC (Reduce latency)

🧠 GC.stat++

32

#define RUBY_INTERNAL_EVENT_GC_START 0x400000 /**< GC started. */

#define RUBY_INTERNAL_EVENT_GC_END_MARK 0x800000 /**< GC ended mark phase. */

#define RUBY_INTERNAL_EVENT_GC_END_SWEEP 0x1000000 /**< GC ended sweep phase. */

#define RUBY_INTERNAL_EVENT_GC_ENTER 0x2000000 /**< `gc_enter()` is called. */

#define RUBY_INTERNAL_EVENT_GC_EXIT 0x4000000 /**< `gc_exit()` is called. */

slides dtdg.co/rubykaigi2025

33 of 93

Investigating Garbage Collector activity

33

33

VALUE print_gc_timing(VALUE _) {

rb_tracepoint_new(0, RUBY_INTERNAL_EVENT_GC_ENTER | RUBY_INTERNAL_EVENT_GC_EXIT, on_gc_event, NULL); �… }

void on_gc_event(VALUE tpval, void *_) {� static uint64_t gc_start_time = 0;� rb_event_flag_t event = rb_tracearg_event_flag(rb_tracearg_from_tracepoint(tpval));

if (event == RUBY_INTERNAL_EVENT_GC_ENTER) {� gc_start_time = get_monotonic_time_ns();

} else if (event == RUBY_INTERNAL_EVENT_GC_EXIT) {� fprintf(stdout, "GC worked for %.2f ms\n", ns_to_ms(get_monotonic_time_ns() - gc_start_time));� }�}

slides dtdg.co/rubykaigi2025

34 of 93

Investigating Garbage Collector activity

34

34

LowlevelToolkit.print_gc_timing do

puts "Minor GC:"

GC.start(full_mark: false)

puts "Major GC:"

GC.start(full_mark: true)

end

# => Minor GC:

# => GC worked for 0.78 ms

# => Major GC:

# => GC worked for 2.60 ms

slides dtdg.co/rubykaigi2025

35 of 93

Investigating Garbage Collector activity

Things to keep in mind:

  • Cannot allocate any objects 😉
    • Ruby will immediately stop 🛑
  • Cannot call into most Ruby APIs inside the callback

35

35

slides dtdg.co/rubykaigi2025

36 of 93

Event hook…?

Hint:

  • TracePoint is actually built atop the low-level “event hook” API
  • Slightly less overhead than TracePoint if called often

36

36

/* undocumented advanced tracing APIs */

void rb_add_event_hook2(rb_event_hook_func_t func, rb_event_flag_t events, VALUE data, rb_event_hook_flag_t hook_flag);

void rb_thread_add_event_hook2(VALUE thval, rb_event_hook_func_t func, rb_event_flag_t events, VALUE data, rb_event_hook_flag_t hook_flag);

slides dtdg.co/rubykaigi2025

37 of 93

Postponed Job API

slides dtdg.co/rubykaigi2025

38 of 93

Run code safely - Postponed job API

Ask Ruby to run code at a “safepoint”

38

rb_postponed_job_handle_t rb_postponed_job_preregister(

unsigned int flags, rb_postponed_job_func_t func, void *data);

void rb_postponed_job_trigger(rb_postponed_job_handle_t h);

slides dtdg.co/rubykaigi2025

39 of 93

“Safepoint”?

* Ruby doesn’t actually call it a “safepoint” ⁉️

Term used by Java/Go/.net/Truffle/etc… to mean a moment in execution where it’s safe to perform certain operations.

39

slides dtdg.co/rubykaigi2025

40 of 93

“Safepoint”?

* Ruby doesn’t actually call it a “safepoint” ⁉️

In the case of the Ruby VM, this means we’ll get called once it’s safe to call into Ruby code, and the GVL is held. (Not in the middle of GC, allocation, etc)

40

slides dtdg.co/rubykaigi2025

41 of 93

Run code safely - Postponed job APIs

In many low-level APIs we’ll cover today:

  • You can’t create objects 📦
  • You can’t call into methods written in Ruby 🔇
  • You can’t call most VM APIs 🚧
  • A different thread may own the GVL 🔀

41

slides dtdg.co/rubykaigi2025

42 of 93

Run code safely - Postponed job APIs

Postponed jobs API is a way of saying to Ruby:� “Hey, when you have a moment… call this function please :)

And thus it can be used to do a lot more things indirectly from the other low-level APIs

42

slides dtdg.co/rubykaigi2025

43 of 93

Run code safely - Postponed job APIs

43

43

at_finish = -> do� kind = GC.latest_gc_info[:major_by]� puts "GC finished (#{kind ? "major (#{kind})" : "minor"})"�end�LowlevelToolkit.on_gc_finish(at_finish) do� GC.start(full_mark: false)� GC.start(full_mark: true)�end

# => GC finished (minor)�# => GC finished (major (force))

slides dtdg.co/rubykaigi2025

44 of 93

Run code safely - Postponed job APIs

44

44

static rb_postponed_job_handle_t postponed_id =� rb_postponed_job_preregister(0, postponed, NULL);

�VALUE on_gc_finish(VALUE _, VALUE user_callback) {

callback = user_callback;� VALUE tp = rb_tracepoint_new(0, RUBY_INTERNAL_EVENT_GC_EXIT, on_gc_finish_event, NULL);� … }

void on_gc_finish_event(VALUE _, void *__) {� rb_postponed_job_trigger(postponed_id);�}

void postponed(void *_) { rb_funcall(callback, rb_intern("call"), 0); }

slides dtdg.co/rubykaigi2025

45 of 93

Run code safely - Postponed job APIs

45

45

at_finish = -> do� kind = GC.latest_gc_info[:major_by]� puts "GC finished (#{kind ? "major (#{kind})" : "minor"})"�end�LowlevelToolkit.on_gc_finish(at_finish) do� GC.start(full_mark: false)� GC.start(full_mark: true)�end

# => GC finished (minor)�# => GC finished (major (force))

slides dtdg.co/rubykaigi2025

46 of 93

Run code safely - Postponed job APIs

Things to keep in mind:

  • API redesigned in Ruby 3.3 ✅
    • Previous API had rare thread-safety/async-safety issue
  • Trigger flag is thread-local 🧵
    • Calling trigger from a background thread doesn’t quite work how you may expect

46

46

slides dtdg.co/rubykaigi2025

47 of 93

Run code safely - Postponed job APIs

Things to keep in mind:

  • Limit of 32 different postponed jobs can be registered
    • No more after that!
    • Ruby GC uses one so… actually 31
  • “Safepoint bias”
    • Next safepoint may be in a different method
    • Important for profilers

47

47

slides dtdg.co/rubykaigi2025

48 of 93

Frame-profiling APIs

slides dtdg.co/rubykaigi2025

49 of 93

Get a backtrace cheaply/safely - Frame-profiling APIs

49

49

int rb_profile_frames(int start, int limit, VALUE *buff, int *lines);

int rb_profile_thread_frames(VALUE thread, int start, int limit, VALUE *buff, int *lines);

VALUE rb_profile_frame_path(VALUE frame);

VALUE rb_profile_frame_absolute_path(VALUE frame);

VALUE rb_profile_frame_label(VALUE frame);

VALUE rb_profile_frame_base_label(VALUE frame);

VALUE rb_profile_frame_full_label(VALUE frame);

VALUE rb_profile_frame_first_lineno(VALUE frame);

VALUE rb_profile_frame_classpath(VALUE frame);

VALUE rb_profile_frame_singleton_method_p(VALUE frame);

VALUE rb_profile_frame_method_name(VALUE frame);

VALUE rb_profile_frame_qualified_method_name(VALUE frame);

slides dtdg.co/rubykaigi2025

50 of 93

Get a backtrace cheaply/safely - Frame-profiling APIs

What methods are being called and where?

50

slides dtdg.co/rubykaigi2025

51 of 93

Get a backtrace cheaply/safely - Frame-profiling APIs

  • Very commonly used by profilers (Datadog, Stackprof, Vernier, pf2, etc…)
  • Get a backtrace without allocating new objects or calling into Ruby code
  • Can be used in signal handler “async-signal safe”

51

slides dtdg.co/rubykaigi2025

52 of 93

Get a backtrace cheaply/safely - Frame-profiling APIs

  • Very similar to Thread#backtrace_locations in practice

  • Designed to be used with other low-level APIs we’ve looked at

52

52

slides dtdg.co/rubykaigi2025

53 of 93

Get a backtrace cheaply/safely - Frame-profiling APIs

53

53

def hello� Object.new� nil�end�LowlevelToolkit.track_last_allocation_at do� hello� pp LowlevelToolkit.last_allocation_at�end

# => [["examples/newobj_backtrace.rb", "hello"], ["examples/newobj_backtrace.rb", "block in <main>"], ["examples/newobj_backtrace.rb", "<main>"]]

slides dtdg.co/rubykaigi2025

54 of 93

Get a backtrace cheaply/safely - Frame-profiling APIs

54

54

VALUE track_last_allocation_at(VALUE _) {� stack = rb_ary_new_capa(MAX_DEPTH);� rb_tracepoint_new(0, RUBY_INTERNAL_EVENT_NEWOBJ, on_newobj_event, NULL);� …�}

void on_newobj_event(VALUE tpval, void *_) {

if (stack == Qnil || rb_objspace_internal_object_p(…)) return;

VALUE buffer[MAX_DEPTH]; // Temporary buffer for calling APIint depth = rb_profile_frames(0, MAX_DEPTH, buffer, NULL);

rb_ary_clear(stack);� for (int i = 0; i < depth; i++) rb_ary_push(stack, buffer[i]);�}

slides dtdg.co/rubykaigi2025

55 of 93

Get a backtrace cheaply/safely - Frame-profiling APIs

55

55

VALUE last_allocation_at(VALUE _) {

VALUE raw_stack = stack;� stack = Qnil; // Pause recording of stacks while we're doing the copying below

VALUE result = rb_ary_new();� for (int i = 0; i < RARRAY_LEN(raw_stack); i++) {

VALUE entry = rb_ary_entry(raw_stack, i);� VALUE file = rb_profile_frame_path(entry);� rb_ary_push(result, file, rb_profile_frame_label(entry));

}� stack = raw_stack; // Resume recording again� return result;�}

slides dtdg.co/rubykaigi2025

56 of 93

Get a backtrace cheaply/safely - Frame-profiling APIs

56

56

def hello� Object.new� nil�end�LowlevelToolkit.track_last_allocation_at do� hello� pp LowlevelToolkit.last_allocation_at�end

# => [["examples/newobj_backtrace.rb", "hello"], ["examples/newobj_backtrace.rb", "block in <main>"], ["examples/newobj_backtrace.rb", "<main>"]]

slides dtdg.co/rubykaigi2025

57 of 93

Get a backtrace cheaply/safely - Frame-profiling APIs

Things to keep in mind:

  • If you store the VALUEs you get from Ruby in a C data structure, remember that you need to tell the Ruby GC about it 👀

57

57

slides dtdg.co/rubykaigi2025

58 of 93

Debug inspector APIs

slides dtdg.co/rubykaigi2025

59 of 93

Navigate the current active backtrace/methods - Debug inspector

59

59

VALUE rb_debug_inspector_open(rb_debug_inspector_func_t func, void *data);

VALUE rb_debug_inspector_backtrace_locations(const rb_debug_inspector_t *dc);

VALUE rb_debug_inspector_frame_self_get(const rb_debug_inspector_t *dc, long index);

VALUE rb_debug_inspector_frame_class_get(const rb_debug_inspector_t *dc, long index);

VALUE rb_debug_inspector_frame_binding_get(const rb_debug_inspector_t *dc, long index);

VALUE rb_debug_inspector_frame_iseq_get(const rb_debug_inspector_t *dc, long index);

VALUE rb_debug_inspector_frame_depth(const rb_debug_inspector_t *dc, long index);

VALUE rb_debug_inspector_current_depth(void);

slides dtdg.co/rubykaigi2025

60 of 93

Navigate the current active backtrace/methods - Debug inspector

Backtraces? Again? Why do we need a second API?

  • Debug inspector is more expensive than frame profiling
    • Not often used by profilers
  • Allocates objects, needs to call into Ruby code, etc

Very different use-case: Built for debuggers!

60

60

slides dtdg.co/rubykaigi2025

61 of 93

Navigate the current active backtrace/methods - Debug inspector

Allows going deeper than any other backtrace API allows:

  • Access objects that methods are getting called on
  • Access binding objects for pending method calls
    • Read and change arguments, local variables, etc 🤩
  • Access instruction sequence objects!

61

61

slides dtdg.co/rubykaigi2025

62 of 93

Navigate the current active backtrace/methods - Debug inspector

62

62

VALUE who_called_me(VALUE _) {� return rb_debug_inspector_open(callback, (void *) !NULL); }�VALUE who_called_me_binding(VALUE _) {� return rb_debug_inspector_open(callback, NULL); }

VALUE callback(const rb_debug_inspector_t *dbg_context, void *return_self) {

return (return_self != NULL) ?� // Depth: 0 -> us; 1 -> caller; 2 -> caller of caller� rb_debug_inspector_frame_self_get(dbg_context, 2) :� rb_debug_inspector_frame_binding_get(dbg_context, 2);�}

slides dtdg.co/rubykaigi2025

63 of 93

Navigate the current active backtrace/methods - Debug inspector

63

63

class Hello� def say_hello(to, secret) = to.hello!�end

class Spy� def hello!

puts "I was called by #{LowlevelToolkit.who_called_me}"� puts "Secret is '#{LowlevelToolkit.who_called_me_binding.local_variable_get(:secret)}'"� end�end

Hello.new.say_hello(Spy.new, "trustno1")

# => I was called by #<Hello:0x00007ff298ae5898>�# => Secret is 'trustno1'

slides dtdg.co/rubykaigi2025

64 of 93

Navigate the current active backtrace/methods - Debug inspector

Things to keep in mind:�

  • There’s already gems for this! 😊
    • debug_inspector/binding_of_caller also allow easy use of this API from Ruby code ;)

64

64

slides dtdg.co/rubykaigi2025

65 of 93

GVL instrumentation API

slides dtdg.co/rubykaigi2025

66 of 93

Wait, what’s this GVL thing again?

Global VM Lock (GVL) in 1 slide:

  • Similar to Python’s GIL - Global Interpreter Lock 🐍
  • Protects CRuby VM state 🔒
  • Is not global anymore – one per Ractor! 👀
  • Prevents Ruby code execution parallelism ⛓️
    • Waiting can happen in parallel ⏳
  • Impacts application latency

66

66

slides dtdg.co/rubykaigi2025

67 of 93

GVL instrumentation API

slides dtdg.co/rubykaigi2025

68 of 93

Track Ruby thread activity - GVL Instrumentation API

68

68

#define RUBY_INTERNAL_THREAD_EVENT_STARTED 1 << 0

/** Triggered when a thread attempt to acquire the GVL. */

#define RUBY_INTERNAL_THREAD_EVENT_READY 1 << 1 /** acquiring GVL */

/** Triggered when a thread successfully acquired the GVL. */

#define RUBY_INTERNAL_THREAD_EVENT_RESUMED 1 << 2 /** acquired GVL */

/** Triggered when a thread released the GVL. */

#define RUBY_INTERNAL_THREAD_EVENT_SUSPENDED 1 << 3 /** released GVL */

/** Triggered when a thread exits. */

#define RUBY_INTERNAL_THREAD_EVENT_EXITED 1 << 4 /** thread terminated */

#define RUBY_INTERNAL_THREAD_EVENT_MASK 0xff /** All Thread events */

rb_internal_thread_event_hook_t *rb_internal_thread_add_event_hook(

rb_internal_thread_event_callback func, rb_event_flag_t events,

void *data);

bool rb_internal_thread_remove_event_hook(rb_internal_thread_event_hook_t * hook);

slides dtdg.co/rubykaigi2025

69 of 93

Track Ruby thread activity - GVL Instrumentation API

Are my web requests being delayed by other requests/threads?

69

69

slides dtdg.co/rubykaigi2025

70 of 93

Track Ruby thread activity - GVL Instrumentation API

Useful to know when threads are

  • running (“RESUMED”)
  • waiting for something to happen (“SUSPENDED”)
  • want the GVL (“READY”)

(screenshot from gvl-tracing gem!)

70

70

slides dtdg.co/rubykaigi2025

71 of 93

Track Ruby thread activity - GVL Instrumentation API

71

71

def counter_loop(counter = 0) = (counter += 1 while counter < 150_000_000)�

pp(LowlevelToolkit.track_wants_gvl do

t1 = Thread.new { counter_loop }.tap { |it| it.name = "t1" }� t2 = Thread.new { counter_loop }.tap { |it| it.name = "t2" }

t1.join; t2.join

end

slides dtdg.co/rubykaigi2025

72 of 93

Track Ruby thread activity - GVL Instrumentation API

72

72

VALUE track_wants_gvl(VALUE _) {� VALUE seen_threads = rb_ary_new();

rb_internal_thread_add_event_hook(on_thread_event, RUBY_INTERNAL_THREAD_EVENT_STARTED | RUBY_INTERNAL_THREAD_EVENT_READY | RUBY_INTERNAL_THREAD_EVENT_RESUMED, seen_threads);

}

slides dtdg.co/rubykaigi2025

73 of 93

Track Ruby thread activity - GVL Instrumentation API

73

73

void on_thread_event(event_id, rb_internal_thread_event_data_t *event_data, void *data) {

VALUE thread = event_data->thread;

if (event_id == RUBY_INTERNAL_THREAD_EVENT_STARTED) {�

} else if (event_id == RUBY_INTERNAL_THREAD_EVENT_READY) {� rb_internal_thread_specific_set(thread, wants_gvl_at_key, get_monotonic_time_ns());

} else if (event_id == RUBY_INTERNAL_THREAD_EVENT_RESUMED) { /* Simplified to fit slides… */

uint64_t wants_gvl_at = rb_internal_thread_specific_get(thread, wants_gvl_at_key);� uint64_t total_wants = rb_internal_thread_specific_get(thread, total_wants_key);� rb_internal_thread_specific_set(� thread, total_wants_key, (total_wants + (get_monotonic_time_ns() - wants_gvl_at)));� }�}

slides dtdg.co/rubykaigi2025

74 of 93

Track Ruby thread activity - GVL Instrumentation API

74

74

def counter_loop(counter = 0) = (counter += 1 while counter < 150_000_000)��before = Time.nowpp(LowlevelToolkit.track_wants_gvl do� t1 = Thread.new { counter_loop }.tap { |it| it.name = "t1" }� t2 = Thread.new { counter_loop }.tap { |it| it.name = "t2" }� t1.join; t2.join

end.map { |thread, wants_gvl_ns| [thread, wants_gvl_ns / 1_000_000_000.0] })

puts "Total time: #{Time.now - before}"

# => [[#<Thread:0x00007520acd952c8@t1 examples/track_wants_gvl.rb:8 dead>, 1.825822156], [#<Thread:0x00007520acd94f30@t2 examples/track_wants_gvl.rb:9 dead>, 1.854341141]]�# => Total time: 3.715163175

slides dtdg.co/rubykaigi2025

75 of 93

Track Ruby thread activity - GVL Instrumentation API

Things to keep in mind:

  • Only “RESUMED” event gets called with GVL being held
  • For the others, cannot call most Ruby APIs
  • rb_internal_thread_specific provides safe way to attach info to threads without GVL
  • gvltools gem already offers a lot of nice counters 😎

75

75

slides dtdg.co/rubykaigi2025

76 of 93

Putting it all together:�Let’s build a new profiler!

77 of 93

Let’s build a new kind of Ruby profiler: a release GVL profiler

When is the GVL being explicitly released in our app? 🔓🥏

  • Doing network calls, reading files, native database drivers
  • When doing “heavy work” in native gem

By knowing when the GVL gets released, we can see the impact of these things in our application!

77

77

slides dtdg.co/rubykaigi2025

78 of 93

Release GVL profiler

78

78

def counter_loop(counter = 0) = (counter += 1 while counter < 150_000_000)

LowlevelToolkit.release_gvl_profiler(filename_prefix: "output") do

counter_loop

Net::HTTP.start("www.google.com") { |it| it.get("/") }

big_data = File.read("/dev/zero", 300_000_000)

Dir.children("/usr/lib")

Zlib::Deflate.deflate(big_data[0, 100_000_000])

end

slides dtdg.co/rubykaigi2025

79 of 93

Release GVL profiler - Time

79

79

LowlevelToolkit.release_gvl_profiler(filename_prefix: "output") do

counter_loop # Doesn't show up, GVL not released

Net::HTTP.start("www.google.com") { |it| it.get("/") }

big_data = File.read("/dev/zero", 300_000_000)

Dir.children("/usr/lib")

Zlib::Deflate.deflate(big_data[0, 100_000_000])

end

slides dtdg.co/rubykaigi2025

80 of 93

Release GVL profiler - Counts

80

80

LowlevelToolkit.release_gvl_profiler(filename_prefix: "output") do

counter_loop # Doesn't show up, GVL not released

Net::HTTP.start("www.google.com") { |it| it.get("/") }

big_data = File.read("/dev/zero", 300_000_000)

Dir.children("/usr/lib")

Zlib::Deflate.deflate(big_data[0, 100_000_000])

end

slides dtdg.co/rubykaigi2025

81 of 93

P.s.: It’s often surprisingly easy to generate output in a format that can easily be visualized with existing UI tools :) 📊📈😎

(Speedscope, perfetto, firefox profiler, … excel?)

81

81

slides dtdg.co/rubykaigi2025

82 of 93

Release GVL profiler

82

82

VALUE release_gvl_profiler(int argc, VALUE *argv, RB_UNUSED_VAR(VALUE _)) {

… // Deal with file name

VALUE result = rb_hash_new();

rb_internal_thread_event_hook_t *hook = rb_internal_thread_add_event_hook(

on_thread_event,

RUBY_INTERNAL_THREAD_EVENT_SUSPENDED | RUBY_INTERNAL_THREAD_EVENT_RESUMED,

(void *) result

);

rb_yield(Qnil);

rb_internal_thread_remove_event_hook(hook);

write_stacks(filename_prefix, result);

return result;

}

slides dtdg.co/rubykaigi2025

83 of 93

Release GVL profiler

83

83

void on_thread_event(event_id, const rb_internal_thread_event_data_t *event_data, void *data) {� VALUE thread = event_data->thread;� if (event_id == RUBY_INTERNAL_THREAD_EVENT_SUSPENDED && !ruby_thread_has_gvl_p()) {� rb_internal_thread_specific_set(thread, release_gvl_at_key, get_monotonic_time_ns());

} else if (event_id == RUBY_INTERNAL_THREAD_EVENT_RESUMED) { /* Simplified to fit slides… */� uint64_t release_gvl_at = rb_internal_thread_specific_get(thread, release_gvl_at_key);

VALUE frames = rb_make_backtrace();

VALUE stats = rb_hash_aref((VALUE) data, frames);� if (stats == Qnil) { stats = … /* Initialize with zeros */ }

uint64_t time_spent = get_monotonic_time_ns() - release_gvl_at;� rb_ary_store(stats, 0, ULL2NUM(NUM2ULL(rb_ary_entry(stats, 0)) + time_spent)); // Time� rb_ary_store(stats, 1, ULL2NUM(NUM2ULL(rb_ary_entry(stats, 1)) + 1)); // Counts�}}

slides dtdg.co/rubykaigi2025

84 of 93

Release GVL profiler

84

84

void write_stacks(VALUE filename_prefix, VALUE result) {

VALUE keys = rb_funcall(result, rb_intern("keys"), 0);

long key_count = RARRAY_LEN(keys);

for (long i = 0; i < key_count; i++) {

VALUE frames = rb_ary_entry(keys, i);

VALUE stats = rb_hash_aref(result, frames);

long frame_count = RARRAY_LEN(frames);

for (long j = frame_count - 1; j >= 0; j--) { … } // Print each frame

fprintf(time_file, " %llu\n", NUM2ULL(rb_ary_entry(stats, 0)));

fprintf(counts_file, " %llu\n", NUM2ULL(rb_ary_entry(stats, 1)));

}

… }

slides dtdg.co/rubykaigi2025

85 of 93

Release GVL profiler - Example result

85

85

examples/release_gvl_profiler.rb:6:in '<main>';examples/release_gvl_profiler.rb:6:in 'LowlevelToolkit.release_gvl_profiler';examples/release_gvl_profiler.rb:10:in 'block in <main>';examples/release_gvl_profiler.rb:10:in 'Dir.children' 180

examples/release_gvl_profiler.rb:6:in '<main>';examples/release_gvl_profiler.rb:6:in 'LowlevelToolkit.release_gvl_profiler';examples/release_gvl_profiler.rb:10:in 'block in <main>';examples/release_gvl_profiler.rb:10:in 'Dir.children' 370759

Count

Time (ns)

slides dtdg.co/rubykaigi2025

86 of 93

That’s it! A full new profiler in 107 lines of code!

86

86

slides dtdg.co/rubykaigi2025

87 of 93

“Performance bugs”

slides dtdg.co/rubykaigi2025

88 of 93

“Performance bugs”?

Are performance bugs really bugs? Should we call them issues? Problems?

The important part is their impact and thus priority!

88

slides dtdg.co/rubykaigi2025

89 of 93

Performance bugs

  • Is this breaking production? 🚨
  • How is this affecting our user experience? 😕
  • Will this take down our site on black friday? 🛒
  • Is this costing us a lot of money? 💸

Fix it!

89

slides dtdg.co/rubykaigi2025

90 of 93

Performance bugs

  • Is this breaking production? 🚨
  • How is this affecting our user experience? 😕
  • Will this take down our site on black friday? 🛒
  • Is this costing us a lot of money? 💸

(Observability tools can help answer these questions!)

90

slides dtdg.co/rubykaigi2025

91 of 93

Performance bugs

  • Is this breaking production? 🚨
  • How is this affecting our user experience? 😕
  • Will this take down our site on black friday? 🛒
  • Is this costing us a lot of money? 💸

If not… use your best judgement ;)

Sometimes worth improving, others not really 😉

91

slides dtdg.co/rubykaigi2025

92 of 93

Performance bugs

What tools will you build with debug.h?

I’m looking forward to seeing observability tools on stage during this + future RubyKaigis! 😍

92

slides dtdg.co/rubykaigi2025

93 of 93

Thank you

slides dtdg.co/rubykaigi2025