Performance Bugs and
Low-Level Ruby Observability APIs
slides dtdg.co/rubykaigi2025
Good morning
2
2
松山
slides dtdg.co/rubykaigi2025
Who am I
3
3
Ivo Anjo
Senior Software Engineer @ Datadog
I work on the�Datadog Ruby Continuous Profiler! 🎉��❤ Making tools to uncover perf insights 👀
slides dtdg.co/rubykaigi2025
Modern monitoring and security as a Service 🐕
Including profiling, distributed tracing, test optimization, security, errors, logs for Ruby
?
slides dtdg.co/rubykaigi2025
Low-Level Ruby Observability APIs?
Low-Level
🔩 Close to the CRuby VM implementation, to details you should usually ignore!
😅 JRuby and TruffleRuby do offer some amazing tools as well, that I won’t be covering today
Observability APIs
🔍 X-Ray what and how exactly your application runs
5
5
slides dtdg.co/rubykaigi2025
But …why?
Observability x “How would you know?” problem
✅ If your tests are passing…
✅ …and the app returns the right result
⏱ ...but takes > 5 seconds to return it…
6
6
slides dtdg.co/rubykaigi2025
How would you know?
7
7
slides dtdg.co/rubykaigi2025
You can use Ruby’s Observability APIs to understand what’s going on!
…Or possibly lots of puts as well, shout out to Aaron 😉 https://tenderlovemaking.com/2016/02/05/i-am-a-puts-debuggerer/
8
8
slides dtdg.co/rubykaigi2025
Observability also helps with the problem
✅ If your tests are passing…
✅ …and the app returns the right result… in staging
❌ ...but breaks in production
9
9
slides dtdg.co/rubykaigi2025
Related: “Ruby is Slow” problem
Is Ruby really slow?
Or is the service…
🐢Doing N+1 queries
⏳Waiting on a too small connection pool
🧵Using too many threads
…etc?
10
10
slides dtdg.co/rubykaigi2025
Existing tools?
🔧 All of these tools are based on the APIs we’re discussing
🤔 Different combinations create very different tools!
11
11
slides dtdg.co/rubykaigi2025
Existing tools?
✨ I want to see new and interesting combinations!
💡 What can be added and improved?
12
12
slides dtdg.co/rubykaigi2025
Let’s look at what Ruby offers and what problems we can investigate with it 👀
13
13
slides dtdg.co/rubykaigi2025
Where the magic happens 🧙
14
14
slides dtdg.co/rubykaigi2025
Yes… Accessing these APIs means writing some C code 😅
github.com/ivoanjo/lowlevel-toolkit
lowlevel-toolkit is a new gem I’ll be using to show how to use these Ruby APIs
15
15
slides dtdg.co/rubykaigi2025
TracePoint APIs
slides dtdg.co/rubykaigi2025
Wait… 🤔
Doesn’t Ruby already�provide the TracePoint�class? 👀
17
17
slides dtdg.co/rubykaigi2025
Yes! ✅
You can use TracePoint with all these events from Ruby code
but…
18
18
line: Execute an expression or statement on a new line
class: Start a class or module definition
end: Finish a class or module definition
call: Call a Ruby method
return: Return from a Ruby method
c_call: Call a C-language routine
c_return: Return from a C-language routine
raise: Raise an exception
rescue: Rescue an exception
b_call: Event hook at block entry
b_return: Event hook at block ending
a_call: Event hook at all calls
a_return: Event hook at all returns
thread_begin: Event hook at thread beginning
thread_end: Event hook at thread ending
fiber_switch: Event hook at fiber switch
script_compiled: New Ruby code compiled
slides dtdg.co/rubykaigi2025
Internal events are only available from C code!
(C code can still use all other events as well)
19
19
#define RUBY_INTERNAL_EVENT_SWITCH 0x040000 /**< Thread switched. */
#define RUBY_INTERNAL_EVENT_NEWOBJ 0x100000 /**< Object allocated. */
#define RUBY_INTERNAL_EVENT_FREEOBJ 0x200000 /**< Object swept. */
#define RUBY_INTERNAL_EVENT_GC_START 0x400000 /**< GC started. */
#define RUBY_INTERNAL_EVENT_GC_END_MARK 0x800000 /**< GC ended mark phase. */
#define RUBY_INTERNAL_EVENT_GC_END_SWEEP 0x1000000 /**< GC ended sweep phase. */
#define RUBY_INTERNAL_EVENT_GC_ENTER 0x2000000 /**< `gc_enter()` is called. */
#define RUBY_INTERNAL_EVENT_GC_EXIT 0x4000000 /**< `gc_exit()` is called. */
slides dtdg.co/rubykaigi2025
Internal events are separate:
They are emitted at special situations
when it’s not possible to access many Ruby APIs
20
20
#define RUBY_INTERNAL_EVENT_SWITCH 0x040000 /**< Thread switched. */
#define RUBY_INTERNAL_EVENT_NEWOBJ 0x100000 /**< Object allocated. */
#define RUBY_INTERNAL_EVENT_FREEOBJ 0x200000 /**< Object swept. */
#define RUBY_INTERNAL_EVENT_GC_START 0x400000 /**< GC started. */
#define RUBY_INTERNAL_EVENT_GC_END_MARK 0x800000 /**< GC ended mark phase. */
#define RUBY_INTERNAL_EVENT_GC_END_SWEEP 0x1000000 /**< GC ended sweep phase. */
#define RUBY_INTERNAL_EVENT_GC_ENTER 0x2000000 /**< `gc_enter()` is called. */
#define RUBY_INTERNAL_EVENT_GC_EXIT 0x4000000 /**< `gc_exit()` is called. */
slides dtdg.co/rubykaigi2025
Let’s look at a few ;)
21
21
#define RUBY_INTERNAL_EVENT_SWITCH 0x040000 /**< Thread switched. */
#define RUBY_INTERNAL_EVENT_NEWOBJ 0x100000 /**< Object allocated. */
#define RUBY_INTERNAL_EVENT_FREEOBJ 0x200000 /**< Object swept. */
#define RUBY_INTERNAL_EVENT_GC_START 0x400000 /**< GC started. */
#define RUBY_INTERNAL_EVENT_GC_END_MARK 0x800000 /**< GC ended mark phase. */
#define RUBY_INTERNAL_EVENT_GC_END_SWEEP 0x1000000 /**< GC ended sweep phase. */
#define RUBY_INTERNAL_EVENT_GC_ENTER 0x2000000 /**< `gc_enter()` is called. */
#define RUBY_INTERNAL_EVENT_GC_EXIT 0x4000000 /**< `gc_exit()` is called. */
slides dtdg.co/rubykaigi2025
Investigating object allocation: NEWOBJ event
Are my app’s web requests delayed by GC?
22
22
slides dtdg.co/rubykaigi2025
Investigating object allocation: NEWOBJ event
Useful for optimization:
🖥️ Creating many objects is expensive (CPU use)
♻️ Makes the Garbage Collector work very often
23
23
slides dtdg.co/rubykaigi2025
Investigating object allocation: NEWOBJ event
24
24
pp(LowlevelToolkit.track_objects_created do
Object.new
Time.utc(2025, 4, 17, 1, 0, 0)
"Hello, world!"
end)
# => [#<Object:0x000078e011750cc8>, 2025-04-17 1:00:00 UTC,� "Hello, world!"]
slides dtdg.co/rubykaigi2025
Investigating object allocation: NEWOBJ event
25
25
VALUE track_objects_created(VALUE _) {� VALUE result = rb_ary_new();� VALUE tp = rb_tracepoint_new(0, RUBY_INTERNAL_EVENT_NEWOBJ, on_newobj_event, (void *) result);� rb_tracepoint_enable(tp); rb_yield(Qnil); rb_tracepoint_disable(tp);� return filter_hidden_objects(result);�}
void on_newobj_event(VALUE tpval, void *data) {� VALUE obj = rb_tracearg_object(rb_tracearg_from_tracepoint(tpval));� if (!rb_objspace_internal_object_p(obj)) rb_ary_push((VALUE) data, obj);�}
slides dtdg.co/rubykaigi2025
Investigating object allocation: NEWOBJ event
26
26
VALUE track_objects_created(VALUE _) {� VALUE result = rb_ary_new();� VALUE tp = rb_tracepoint_new(0, RUBY_INTERNAL_EVENT_NEWOBJ, on_newobj_event, (void *) result);� rb_tracepoint_enable(tp); rb_yield(Qnil); rb_tracepoint_disable(tp);� return filter_hidden_objects(result);�}
VALUE filter_hidden_objects(VALUE result) {� for (int i = 0; i < RARRAY_LEN(result); i++)� if (!RBASIC_CLASS(rb_ary_entry(result, i))) rb_ary_store(result, i, Qnil);� return result;�}
slides dtdg.co/rubykaigi2025
Investigating object allocation: NEWOBJ event
27
27
pp(LowlevelToolkit.track_objects_created do
Object.new
Time.utc(2025, 4, 17, 1, 0, 0)
"Hello, world!"
end)
# => [#<Object:0x000078e011750cc8>, 2025-04-17 1:00:00 UTC,� "Hello, world!"]
Objects did not get garbage collected! 😊
slides dtdg.co/rubykaigi2025
Investigating object allocation: NEWOBJ event
Things to keep in mind:
28
28
slides dtdg.co/rubykaigi2025
Investigating object allocation: NEWOBJ event
🤔 Internal and hidden objects?
Ruby uses objects to keep internal details such as:
29
29
3.4.1 :001 > ObjectSpace.count_objects
=>
{TOTAL: 596944,
FREE: 80120,
T_OBJECT: 210201,
T_CLASS: 2507,
T_MODULE: 197,
T_FLOAT: 416,
T_STRING: 78085,
T_REGEXP: 400,
T_ARRAY: 105246,
T_HASH: 6218,
T_STRUCT: 2893,
T_BIGNUM: 2,
T_FILE: 5,
T_DATA: 60876,
T_MATCH: 11110,
T_COMPLEX: 1,
T_RATIONAL: 2,
T_SYMBOL: 388,
T_IMEMO: 38018,
T_ICLASS: 259}
slides dtdg.co/rubykaigi2025
Investigating object allocation: NEWOBJ event
🤔 Internal and hidden objects?
The Ruby VM hides these objects from Ruby code, but many low-level APIs actually show them! 👀
30
30
3.4.1 :001 > ObjectSpace.count_objects
=>
{TOTAL: 596944,
FREE: 80120,
T_OBJECT: 210201,
T_CLASS: 2507,
T_MODULE: 197,
T_FLOAT: 416,
T_STRING: 78085,
T_REGEXP: 400,
T_ARRAY: 105246,
T_HASH: 6218,
T_STRUCT: 2893,
T_BIGNUM: 2,
T_FILE: 5,
T_DATA: 60876,
T_MATCH: 11110,
T_COMPLEX: 1,
T_RATIONAL: 2,
T_SYMBOL: 388,
T_IMEMO: 38018,
T_ICLASS: 259}
slides dtdg.co/rubykaigi2025
Investigating Garbage Collector activity
When is garbage collection happening?
31
slides dtdg.co/rubykaigi2025
Investigating Garbage Collector activity
♻️ Useful to reduce impact of GC (Reduce latency)
🧠 GC.stat++
32
#define RUBY_INTERNAL_EVENT_GC_START 0x400000 /**< GC started. */
#define RUBY_INTERNAL_EVENT_GC_END_MARK 0x800000 /**< GC ended mark phase. */
#define RUBY_INTERNAL_EVENT_GC_END_SWEEP 0x1000000 /**< GC ended sweep phase. */
#define RUBY_INTERNAL_EVENT_GC_ENTER 0x2000000 /**< `gc_enter()` is called. */
#define RUBY_INTERNAL_EVENT_GC_EXIT 0x4000000 /**< `gc_exit()` is called. */
slides dtdg.co/rubykaigi2025
Investigating Garbage Collector activity
33
33
VALUE print_gc_timing(VALUE _) {
rb_tracepoint_new(0, RUBY_INTERNAL_EVENT_GC_ENTER | RUBY_INTERNAL_EVENT_GC_EXIT, on_gc_event, NULL); �… }
void on_gc_event(VALUE tpval, void *_) {� static uint64_t gc_start_time = 0;� rb_event_flag_t event = rb_tracearg_event_flag(rb_tracearg_from_tracepoint(tpval));
if (event == RUBY_INTERNAL_EVENT_GC_ENTER) {� gc_start_time = get_monotonic_time_ns();
} else if (event == RUBY_INTERNAL_EVENT_GC_EXIT) {� fprintf(stdout, "GC worked for %.2f ms\n", ns_to_ms(get_monotonic_time_ns() - gc_start_time));� }�}
slides dtdg.co/rubykaigi2025
Investigating Garbage Collector activity
34
34
LowlevelToolkit.print_gc_timing do
puts "Minor GC:"
GC.start(full_mark: false)
puts "Major GC:"
GC.start(full_mark: true)
end
# => Minor GC:
# => GC worked for 0.78 ms
# => Major GC:
# => GC worked for 2.60 ms
slides dtdg.co/rubykaigi2025
Investigating Garbage Collector activity
Things to keep in mind:
35
35
slides dtdg.co/rubykaigi2025
Event hook…?
Hint:
36
36
/* undocumented advanced tracing APIs */
void rb_add_event_hook2(rb_event_hook_func_t func, rb_event_flag_t events, VALUE data, rb_event_hook_flag_t hook_flag);
void rb_thread_add_event_hook2(VALUE thval, rb_event_hook_func_t func, rb_event_flag_t events, VALUE data, rb_event_hook_flag_t hook_flag);
slides dtdg.co/rubykaigi2025
Postponed Job API
slides dtdg.co/rubykaigi2025
Run code safely - Postponed job API
Ask Ruby to run code at a “safepoint”
38
rb_postponed_job_handle_t rb_postponed_job_preregister(
unsigned int flags, rb_postponed_job_func_t func, void *data);
void rb_postponed_job_trigger(rb_postponed_job_handle_t h);
slides dtdg.co/rubykaigi2025
“Safepoint”?
* Ruby doesn’t actually call it a “safepoint” ⁉️
Term used by Java/Go/.net/Truffle/etc… to mean a moment in execution where it’s safe to perform certain operations.
39
slides dtdg.co/rubykaigi2025
“Safepoint”?
* Ruby doesn’t actually call it a “safepoint” ⁉️
In the case of the Ruby VM, this means we’ll get called once it’s safe to call into Ruby code, and the GVL is held. (Not in the middle of GC, allocation, etc)
40
slides dtdg.co/rubykaigi2025
Run code safely - Postponed job APIs
In many low-level APIs we’ll cover today:
41
slides dtdg.co/rubykaigi2025
Run code safely - Postponed job APIs
Postponed jobs API is a way of saying to Ruby:� “Hey, when you have a moment… call this function please :)”
And thus it can be used to do a lot more things indirectly from the other low-level APIs
42
slides dtdg.co/rubykaigi2025
Run code safely - Postponed job APIs
43
43
at_finish = -> do� kind = GC.latest_gc_info[:major_by]� puts "GC finished (#{kind ? "major (#{kind})" : "minor"})"�end��LowlevelToolkit.on_gc_finish(at_finish) do� GC.start(full_mark: false)� GC.start(full_mark: true)�end
# => GC finished (minor)�# => GC finished (major (force))
slides dtdg.co/rubykaigi2025
Run code safely - Postponed job APIs
44
44
static rb_postponed_job_handle_t postponed_id =� rb_postponed_job_preregister(0, postponed, NULL);
�VALUE on_gc_finish(VALUE _, VALUE user_callback) {
callback = user_callback;� VALUE tp = rb_tracepoint_new(0, RUBY_INTERNAL_EVENT_GC_EXIT, on_gc_finish_event, NULL);� … }
void on_gc_finish_event(VALUE _, void *__) {� rb_postponed_job_trigger(postponed_id);�}
void postponed(void *_) { rb_funcall(callback, rb_intern("call"), 0); }
slides dtdg.co/rubykaigi2025
Run code safely - Postponed job APIs
45
45
at_finish = -> do� kind = GC.latest_gc_info[:major_by]� puts "GC finished (#{kind ? "major (#{kind})" : "minor"})"�end��LowlevelToolkit.on_gc_finish(at_finish) do� GC.start(full_mark: false)� GC.start(full_mark: true)�end
# => GC finished (minor)�# => GC finished (major (force))
slides dtdg.co/rubykaigi2025
Run code safely - Postponed job APIs
Things to keep in mind:
46
46
slides dtdg.co/rubykaigi2025
Run code safely - Postponed job APIs
Things to keep in mind:
47
47
slides dtdg.co/rubykaigi2025
Frame-profiling APIs
slides dtdg.co/rubykaigi2025
Get a backtrace cheaply/safely - Frame-profiling APIs
49
49
int rb_profile_frames(int start, int limit, VALUE *buff, int *lines);
int rb_profile_thread_frames(VALUE thread, int start, int limit, VALUE *buff, int *lines);
VALUE rb_profile_frame_path(VALUE frame);
VALUE rb_profile_frame_absolute_path(VALUE frame);
VALUE rb_profile_frame_label(VALUE frame);
VALUE rb_profile_frame_base_label(VALUE frame);
VALUE rb_profile_frame_full_label(VALUE frame);
VALUE rb_profile_frame_first_lineno(VALUE frame);
VALUE rb_profile_frame_classpath(VALUE frame);
VALUE rb_profile_frame_singleton_method_p(VALUE frame);
VALUE rb_profile_frame_method_name(VALUE frame);
VALUE rb_profile_frame_qualified_method_name(VALUE frame);
slides dtdg.co/rubykaigi2025
Get a backtrace cheaply/safely - Frame-profiling APIs
What methods are being called and where?
50
slides dtdg.co/rubykaigi2025
Get a backtrace cheaply/safely - Frame-profiling APIs
51
slides dtdg.co/rubykaigi2025
Get a backtrace cheaply/safely - Frame-profiling APIs
52
52
slides dtdg.co/rubykaigi2025
Get a backtrace cheaply/safely - Frame-profiling APIs
53
53
def hello� Object.new� nil�end��LowlevelToolkit.track_last_allocation_at do� hello� pp LowlevelToolkit.last_allocation_at�end�
# => [["examples/newobj_backtrace.rb", "hello"], ["examples/newobj_backtrace.rb", "block in <main>"], ["examples/newobj_backtrace.rb", "<main>"]]
slides dtdg.co/rubykaigi2025
Get a backtrace cheaply/safely - Frame-profiling APIs
54
54
VALUE track_last_allocation_at(VALUE _) {� stack = rb_ary_new_capa(MAX_DEPTH);� rb_tracepoint_new(0, RUBY_INTERNAL_EVENT_NEWOBJ, on_newobj_event, NULL);� …�}
void on_newobj_event(VALUE tpval, void *_) {
if (stack == Qnil || rb_objspace_internal_object_p(…)) return;
VALUE buffer[MAX_DEPTH]; // Temporary buffer for calling API� int depth = rb_profile_frames(0, MAX_DEPTH, buffer, NULL);
rb_ary_clear(stack);� for (int i = 0; i < depth; i++) rb_ary_push(stack, buffer[i]);�}
slides dtdg.co/rubykaigi2025
Get a backtrace cheaply/safely - Frame-profiling APIs
55
55
VALUE last_allocation_at(VALUE _) {
VALUE raw_stack = stack;� stack = Qnil; // Pause recording of stacks while we're doing the copying below
VALUE result = rb_ary_new();� for (int i = 0; i < RARRAY_LEN(raw_stack); i++) {
VALUE entry = rb_ary_entry(raw_stack, i);� VALUE file = rb_profile_frame_path(entry);� rb_ary_push(result, … file, rb_profile_frame_label(entry));
}� stack = raw_stack; // Resume recording again� return result;�}
slides dtdg.co/rubykaigi2025
Get a backtrace cheaply/safely - Frame-profiling APIs
56
56
def hello� Object.new� nil�end��LowlevelToolkit.track_last_allocation_at do� hello� pp LowlevelToolkit.last_allocation_at�end�
# => [["examples/newobj_backtrace.rb", "hello"], ["examples/newobj_backtrace.rb", "block in <main>"], ["examples/newobj_backtrace.rb", "<main>"]]
slides dtdg.co/rubykaigi2025
Get a backtrace cheaply/safely - Frame-profiling APIs
Things to keep in mind:
57
57
slides dtdg.co/rubykaigi2025
Debug inspector APIs
slides dtdg.co/rubykaigi2025
Navigate the current active backtrace/methods - Debug inspector
59
59
VALUE rb_debug_inspector_open(rb_debug_inspector_func_t func, void *data);
VALUE rb_debug_inspector_backtrace_locations(const rb_debug_inspector_t *dc);
VALUE rb_debug_inspector_frame_self_get(const rb_debug_inspector_t *dc, long index);
VALUE rb_debug_inspector_frame_class_get(const rb_debug_inspector_t *dc, long index);
VALUE rb_debug_inspector_frame_binding_get(const rb_debug_inspector_t *dc, long index);
VALUE rb_debug_inspector_frame_iseq_get(const rb_debug_inspector_t *dc, long index);
VALUE rb_debug_inspector_frame_depth(const rb_debug_inspector_t *dc, long index);
VALUE rb_debug_inspector_current_depth(void);
slides dtdg.co/rubykaigi2025
Navigate the current active backtrace/methods - Debug inspector
Backtraces? Again? Why do we need a second API?
Very different use-case: Built for debuggers!
60
60
slides dtdg.co/rubykaigi2025
Navigate the current active backtrace/methods - Debug inspector
Allows going deeper than any other backtrace API allows:
61
61
slides dtdg.co/rubykaigi2025
Navigate the current active backtrace/methods - Debug inspector
62
62
VALUE who_called_me(VALUE _) {� return rb_debug_inspector_open(callback, (void *) !NULL); }�VALUE who_called_me_binding(VALUE _) {� return rb_debug_inspector_open(callback, NULL); }
VALUE callback(const rb_debug_inspector_t *dbg_context, void *return_self) {
return (return_self != NULL) ?� // Depth: 0 -> us; 1 -> caller; 2 -> caller of caller� rb_debug_inspector_frame_self_get(dbg_context, 2) :� rb_debug_inspector_frame_binding_get(dbg_context, 2);�}
slides dtdg.co/rubykaigi2025
Navigate the current active backtrace/methods - Debug inspector
63
63
class Hello� def say_hello(to, secret) = to.hello!�end
class Spy� def hello!
puts "I was called by #{LowlevelToolkit.who_called_me}"� puts "Secret is '#{LowlevelToolkit.who_called_me_binding.local_variable_get(:secret)}'"� end�end
Hello.new.say_hello(Spy.new, "trustno1")
# => I was called by #<Hello:0x00007ff298ae5898>�# => Secret is 'trustno1'
slides dtdg.co/rubykaigi2025
Navigate the current active backtrace/methods - Debug inspector
Things to keep in mind:�
64
64
slides dtdg.co/rubykaigi2025
GVL instrumentation API
slides dtdg.co/rubykaigi2025
Wait, what’s this GVL thing again?
Global VM Lock (GVL) in 1 slide:
66
66
slides dtdg.co/rubykaigi2025
GVL instrumentation API
slides dtdg.co/rubykaigi2025
Track Ruby thread activity - GVL Instrumentation API
68
68
#define RUBY_INTERNAL_THREAD_EVENT_STARTED 1 << 0
/** Triggered when a thread attempt to acquire the GVL. */
#define RUBY_INTERNAL_THREAD_EVENT_READY 1 << 1 /** acquiring GVL */
/** Triggered when a thread successfully acquired the GVL. */
#define RUBY_INTERNAL_THREAD_EVENT_RESUMED 1 << 2 /** acquired GVL */
/** Triggered when a thread released the GVL. */
#define RUBY_INTERNAL_THREAD_EVENT_SUSPENDED 1 << 3 /** released GVL */
/** Triggered when a thread exits. */
#define RUBY_INTERNAL_THREAD_EVENT_EXITED 1 << 4 /** thread terminated */
#define RUBY_INTERNAL_THREAD_EVENT_MASK 0xff /** All Thread events */
rb_internal_thread_event_hook_t *rb_internal_thread_add_event_hook(
rb_internal_thread_event_callback func, rb_event_flag_t events,
void *data);
bool rb_internal_thread_remove_event_hook(rb_internal_thread_event_hook_t * hook);
slides dtdg.co/rubykaigi2025
Track Ruby thread activity - GVL Instrumentation API
Are my web requests being delayed by other requests/threads?
69
69
slides dtdg.co/rubykaigi2025
Track Ruby thread activity - GVL Instrumentation API
Useful to know when threads are
(screenshot from gvl-tracing gem!)
70
70
slides dtdg.co/rubykaigi2025
Track Ruby thread activity - GVL Instrumentation API
71
71
def counter_loop(counter = 0) = (counter += 1 while counter < 150_000_000)�
�pp(LowlevelToolkit.track_wants_gvl do
t1 = Thread.new { counter_loop }.tap { |it| it.name = "t1" }� t2 = Thread.new { counter_loop }.tap { |it| it.name = "t2" }
t1.join; t2.join
end
slides dtdg.co/rubykaigi2025
Track Ruby thread activity - GVL Instrumentation API
72
72
VALUE track_wants_gvl(VALUE _) {� VALUE seen_threads = rb_ary_new();
rb_internal_thread_add_event_hook(on_thread_event, RUBY_INTERNAL_THREAD_EVENT_STARTED | RUBY_INTERNAL_THREAD_EVENT_READY | RUBY_INTERNAL_THREAD_EVENT_RESUMED, seen_threads);
…
}
slides dtdg.co/rubykaigi2025
Track Ruby thread activity - GVL Instrumentation API
73
73
void on_thread_event(event_id, rb_internal_thread_event_data_t *event_data, void *data) {
VALUE thread = event_data->thread;
if (event_id == RUBY_INTERNAL_THREAD_EVENT_STARTED) {� …
} else if (event_id == RUBY_INTERNAL_THREAD_EVENT_READY) {� rb_internal_thread_specific_set(thread, wants_gvl_at_key, get_monotonic_time_ns());
} else if (event_id == RUBY_INTERNAL_THREAD_EVENT_RESUMED) { /* Simplified to fit slides… */
uint64_t wants_gvl_at = rb_internal_thread_specific_get(thread, wants_gvl_at_key);� uint64_t total_wants = rb_internal_thread_specific_get(thread, total_wants_key);� rb_internal_thread_specific_set(� thread, total_wants_key, (total_wants + (get_monotonic_time_ns() - wants_gvl_at)));� }�}
slides dtdg.co/rubykaigi2025
Track Ruby thread activity - GVL Instrumentation API
74
74
def counter_loop(counter = 0) = (counter += 1 while counter < 150_000_000)��before = Time.now�pp(LowlevelToolkit.track_wants_gvl do� t1 = Thread.new { counter_loop }.tap { |it| it.name = "t1" }� t2 = Thread.new { counter_loop }.tap { |it| it.name = "t2" }� t1.join; t2.join
end.map { |thread, wants_gvl_ns| [thread, wants_gvl_ns / 1_000_000_000.0] })
puts "Total time: #{Time.now - before}"
# => [[#<Thread:0x00007520acd952c8@t1 examples/track_wants_gvl.rb:8 dead>, 1.825822156], [#<Thread:0x00007520acd94f30@t2 examples/track_wants_gvl.rb:9 dead>, 1.854341141]]�# => Total time: 3.715163175
slides dtdg.co/rubykaigi2025
Track Ruby thread activity - GVL Instrumentation API
Things to keep in mind:
75
75
slides dtdg.co/rubykaigi2025
Putting it all together:�Let’s build a new profiler!
Let’s build a new kind of Ruby profiler: a release GVL profiler
When is the GVL being explicitly released in our app? 🔓🥏
By knowing when the GVL gets released, we can see the impact of these things in our application!
77
77
slides dtdg.co/rubykaigi2025
Release GVL profiler
78
78
def counter_loop(counter = 0) = (counter += 1 while counter < 150_000_000)
LowlevelToolkit.release_gvl_profiler(filename_prefix: "output") do
counter_loop
Net::HTTP.start("www.google.com") { |it| it.get("/") }
big_data = File.read("/dev/zero", 300_000_000)
Dir.children("/usr/lib")
Zlib::Deflate.deflate(big_data[0, 100_000_000])
end
slides dtdg.co/rubykaigi2025
Release GVL profiler - Time
79
79
LowlevelToolkit.release_gvl_profiler(filename_prefix: "output") do
counter_loop # Doesn't show up, GVL not released
Net::HTTP.start("www.google.com") { |it| it.get("/") }
big_data = File.read("/dev/zero", 300_000_000)
Dir.children("/usr/lib")
Zlib::Deflate.deflate(big_data[0, 100_000_000])
end
slides dtdg.co/rubykaigi2025
Release GVL profiler - Counts
80
80
LowlevelToolkit.release_gvl_profiler(filename_prefix: "output") do
counter_loop # Doesn't show up, GVL not released
Net::HTTP.start("www.google.com") { |it| it.get("/") }
big_data = File.read("/dev/zero", 300_000_000)
Dir.children("/usr/lib")
Zlib::Deflate.deflate(big_data[0, 100_000_000])
end
slides dtdg.co/rubykaigi2025
P.s.: It’s often surprisingly easy to generate output in a format that can easily be visualized with existing UI tools :) 📊📈😎
(Speedscope, perfetto, firefox profiler, … excel?)
81
81
slides dtdg.co/rubykaigi2025
Release GVL profiler
82
82
VALUE release_gvl_profiler(int argc, VALUE *argv, RB_UNUSED_VAR(VALUE _)) {
… // Deal with file name
VALUE result = rb_hash_new();
rb_internal_thread_event_hook_t *hook = rb_internal_thread_add_event_hook(
on_thread_event,
RUBY_INTERNAL_THREAD_EVENT_SUSPENDED | RUBY_INTERNAL_THREAD_EVENT_RESUMED,
(void *) result
);
rb_yield(Qnil);
rb_internal_thread_remove_event_hook(hook);
write_stacks(filename_prefix, result);
return result;
}
slides dtdg.co/rubykaigi2025
Release GVL profiler
83
83
void on_thread_event(event_id, const rb_internal_thread_event_data_t *event_data, void *data) {� VALUE thread = event_data->thread;� if (event_id == RUBY_INTERNAL_THREAD_EVENT_SUSPENDED && !ruby_thread_has_gvl_p()) {� rb_internal_thread_specific_set(thread, release_gvl_at_key, get_monotonic_time_ns());
} else if (event_id == RUBY_INTERNAL_THREAD_EVENT_RESUMED) { /* Simplified to fit slides… */� uint64_t release_gvl_at = rb_internal_thread_specific_get(thread, release_gvl_at_key);
VALUE frames = rb_make_backtrace();
VALUE stats = rb_hash_aref((VALUE) data, frames);� if (stats == Qnil) { stats = … /* Initialize with zeros */ }
uint64_t time_spent = get_monotonic_time_ns() - release_gvl_at;� rb_ary_store(stats, 0, ULL2NUM(NUM2ULL(rb_ary_entry(stats, 0)) + time_spent)); // Time� rb_ary_store(stats, 1, ULL2NUM(NUM2ULL(rb_ary_entry(stats, 1)) + 1)); // Counts�}}
slides dtdg.co/rubykaigi2025
Release GVL profiler
84
84
void write_stacks(VALUE filename_prefix, VALUE result) {
…
VALUE keys = rb_funcall(result, rb_intern("keys"), 0);
long key_count = RARRAY_LEN(keys);
for (long i = 0; i < key_count; i++) {
VALUE frames = rb_ary_entry(keys, i);
VALUE stats = rb_hash_aref(result, frames);
long frame_count = RARRAY_LEN(frames);
for (long j = frame_count - 1; j >= 0; j--) { … } // Print each frame
fprintf(time_file, " %llu\n", NUM2ULL(rb_ary_entry(stats, 0)));
fprintf(counts_file, " %llu\n", NUM2ULL(rb_ary_entry(stats, 1)));
}
… }
slides dtdg.co/rubykaigi2025
Release GVL profiler - Example result
85
85
examples/release_gvl_profiler.rb:6:in '<main>';examples/release_gvl_profiler.rb:6:in 'LowlevelToolkit.release_gvl_profiler';examples/release_gvl_profiler.rb:10:in 'block in <main>';examples/release_gvl_profiler.rb:10:in 'Dir.children' 180
examples/release_gvl_profiler.rb:6:in '<main>';examples/release_gvl_profiler.rb:6:in 'LowlevelToolkit.release_gvl_profiler';examples/release_gvl_profiler.rb:10:in 'block in <main>';examples/release_gvl_profiler.rb:10:in 'Dir.children' 370759
Count
Time (ns)
slides dtdg.co/rubykaigi2025
That’s it! A full new profiler in 107 lines of code!
86
86
slides dtdg.co/rubykaigi2025
“Performance bugs”
slides dtdg.co/rubykaigi2025
“Performance bugs”?
Are performance bugs really bugs? Should we call them issues? Problems?
The important part is their impact and thus priority!
88
slides dtdg.co/rubykaigi2025
Performance bugs
Fix it!
89
slides dtdg.co/rubykaigi2025
Performance bugs
(Observability tools can help answer these questions!)
90
slides dtdg.co/rubykaigi2025
Performance bugs
If not… use your best judgement ;)
Sometimes worth improving, others not really 😉
91
slides dtdg.co/rubykaigi2025
Performance bugs
What tools will you build with debug.h?
I’m looking forward to seeing observability tools on stage during this + future RubyKaigis! 😍
92
slides dtdg.co/rubykaigi2025
Thank you
Ivo Anjo
ivo.anjo@datadoghq.com / @knux�dtdg.co/rubykaigi2025�github.com/ivoanjo/lowlevel-toolkit
slides dtdg.co/rubykaigi2025