RISC-V Linux Tracing (K/Uprobe)
Guo Ren
<guoren@kernel.org>
<guoren@linux.alibaba.com>
Self Introduction
More details ref to t-head
My work
Related work
Statistics in linux-5.8
GOAL: Let RISC-V & C-SKY support the features above this year
Agenda
Kprobe
How probe ‘inst 2’ to a kprobe point work?
kprobe_breakpoint_handler()
{
kprobe_pre_handler()
xol_exec/simulate(Inst 2)
kprobe_post_handler()
regs->pc = &inst3
}
Inst 0
Inst 1
Inst 2
Inst 3
Inst 4
Inst 0
Inst 1
ebreak
Inst 3
Inst 4
Trap exception
Emulate replaced instruction (xol/simulate/opt_exe)
Single step execute the replaced instruction
void do_trap_break(struct pt_regs *regs)
{
#ifdef CONFIG_KPROBES
if (kprobe_single_step_handler(regs))
return;
if (kprobe_breakpoint_handler(regs))
return;
#endif
Xol slot:�<Replaced Instruction>�ebreak
Exception return
Simulate replaced instruction
Pending replaced instruction(s) with trampoline
An optimized way to execute replaced instruction used by OPTPROBES
I’ve talked about this in another topic - Kprobes Jump Optimized for more Archs
trampoline:
Inst 2
Jmp -> &Inst 3
detour_buffer()
{
kprobe_trampline_handler()
Jmp -> inst2_trampoline
Inst 0
Inst 1
Inst 2
Inst 3
Inst 4
Inst 0
Inst 1
J kp�...
Inst 4
Kretprobe
kprobe_trampoline()
{
ri->rp->handler(ri, regs);
Jmp to ‘&Inst 3’;
}
Caller:
Inst 0
Inst 1
call
Inst 3
Callee:
break point
Inst 1
Inst 2
ret
int register_kretprobe(struct kretprobe *rp) {
rp->kp.pre_handler = pre_handler_kretprobe();
Uprobe
Kprobe on ftrace
# cat /sys/kernel/debug/kprobes/list�(current)
ffffffe00020af7e k _do_fork+0x1a [FTRACE]��(should be)�ffffffe00020af7e k _do_fork+0x0 [FTRACE]�Suggested by Masami (Use -fpatchable-function-entry in ftrace) ref: mailing list. Now, I agree with that and it should be implemented immediately.�
Current ftrace detour mechanism by Alan
-pg -fno-omit-frame-pointer -fno-optimize-sibling-calls -Wl,--no-relax :
000000000001065c <funca>:
1065c: 1141 addi sp,sp,-16
1065e: e406 sd ra,8(sp)
10660: e022 sd s0,0(sp)
10662: 0800 addi s0,sp,16
10664: 8786 mv a5,ra
10666: 853e mv a0,a5
10668: 00000097 auipc ra,0x0 -> nop
1066c: ed8080e7 jalr -296(ra) <_mcount@plt> -> nop
10670: 000127b7 lui a5,0x12
10674: 06c7a783 lw a5,108(a5) # 1206c <a>
10678: 2785 addiw a5,a5,1
1067a: 0007871b sext.w a4,a5
1067e: 000127b7 lui a5,0x12
10682: 06e7a623 sw a4,108(a5) # 1206c <a>
10686: 4781 li a5,0
10688: 853e mv a0,a5
1068a: 60a2 ld ra,8(sp)
1068c: 6402 ld s0,0(sp)
1068e: 0141 addi sp,sp,16
10690: 8082 ret
When enable ftrace, replace nop with jmp ftrace_xxx_caller instructions:
000000000001065c <funca>:
1065c: 1141 addi sp,sp,-16
1065e: e406 sd ra,8(sp)
10660: e022 sd s0,0(sp)
10662: 0800 addi s0,sp,16
10664: 8786 mv a5,ra
10666: 853e mv a0,a5
10668: 00010001 nop -> auipc ra, 0x
1066c: 00010001 nop -> jalr ftrace_(regs_)caller
10670: 000127b7 lui a5,0x12
10674: 06c7a783 lw a5,108(a5) # 1206c <a>
10678: 2785 addiw a5,a5,1
1067a: 0007871b sext.w a4,a5
1067e: 000127b7 lui a5,0x12
10682: 06e7a623 sw a4,108(a5) # 1206c <a>
10686: 4781 li a5,0
10688: 853e mv a0,a5
1068a: 60a2 ld ra,8(sp)
1068c: 6402 ld s0,0(sp)
1068e: 0141 addi sp,sp,16
10690: 8082 ret
-fpatchable-function-entry, solution 1:
00000000000103fe <funca>:
103fe: 0001 nop
10400: 0001 nop
10402: 00010001 nop, nop
10406: 00010001 nop, nop
1040a: 0001 nop
1040c: 0001 nop
1040e: 1141 addi sp,sp,-16
10410: e422 sd s0,8(sp)
10412: 0800 addi s0,sp,16
10414: 8301a783 lw a5,-2000(gp) # 12030 <a>
10418: 2785 addiw a5,a5,1
1041a: 0007871b sext.w a4,a5
1041e: 82e1a823 sw a4,-2000(gp) # 12030 <a>
10422: 4781 li a5,0
10424: 853e mv a0,a5
10426: 6422 ld s0,8(sp)
10428: 0141 addi sp,sp,16
1042a: 8082 ret
00000000000103fe <funca>:
104fe: e406xxxx sd, ra, -8(sp)
10402: 00000097 auipc ra, 0x
10406: ed80xxxx jalr <ftrace_xxx_caller>
1040a: xxxxxxxx ld ra, -8(sp)
1040e: 1141 addi sp,sp,-16
10410: e422 sd s0,8(sp)
10412: 0800 addi s0,sp,16
10414: 8301a783 lw a5,-2000(gp) # 12030 <a>
10418: 2785 addiw a5,a5,1
1041a: 0007871b sext.w a4,a5
1041e: 82e1a823 sw a4,-2000(gp) # 12030 <a>
10422: 4781 li a5,0
10424: 853e mv a0,a5
10426: 6422 ld s0,8(sp)
10428: 0141 addi sp,sp,16
1042a: 8082 ret
-fpatchable-function-entry, solution 2:
00000000000103fe <funca>:
103fe: 0001 nop
10400: 0001 nop
10402: 00010001 nop, nop
10406: 00010001 nop, nop
1040a: 0001 nop
1040c: 0001 nop
1040e: 1141 addi sp,sp,-16
10410: e422 sd s0,8(sp)
10412: 0800 addi s0,sp,16
10414: 8301a783 lw a5,-2000(gp) # 12030 <a>
10418: 2785 addiw a5,a5,1
1041a: 0007871b sext.w a4,a5
1041e: 82e1a823 sw a4,-2000(gp) # 12030 <a>
10422: 4781 li a5,0
10424: 853e mv a0,a5
10426: 6422 ld s0,8(sp)
10428: 0141 addi sp,sp,16
1042a: 8082 ret
00000000000103fe <funca>:
103fe: 0001 nop
104fe: xxxx mv, x9, ra
10402: 00000097 auipc ra, xxxxx
10406: ed80xxxx jalr <ftrace_xxx_caller>
1040a: 0001 nop
1040c: 0001 nop
1040e: 1141 addi sp,sp,-16
10410: e422 sd s0,8(sp)
10412: 0800 addi s0,sp,16
10414: 8301a783 lw a5,-2000(gp) # 12030 <a>
10418: 2785 addiw a5,a5,1
1041a: 0007871b sext.w a4,a5
1041e: 82e1a823 sw a4,-2000(gp) # 12030 <a>
10422: 4781 li a5,0
10424: 853e mv a0,a5
10426: 6422 ld s0,8(sp)
10428: 0141 addi sp,sp,16
1042a: 8082 ret
(similar to arm64 x9-x17)
Vote: A / B
A: (solution 1)�00000000000103fe <funca>:
103fe: 1141 addi sp, sp, -8
10400: e406 sd, ra, 0(sp)
10402: 00000097 auipc ra, 0x
10406: ed80xxxx jalr <ftrace_xxx_caller>
1040a: 60a2 ld ra, 0(sp)
1040c: 0141 addi sp, sp, 8
B: (solution 2)�00000000000103fe <funca>:
104fe: ???? mv, x9, ra
10402: 00000097 auipc ra, xxxxx
10406: ed80xxxx jalr <ftrace_xxx_caller>
(Need to modify riscv elf abi)
Kprobes Jump Optimized
Performance problem of break-point kprobe
k = unoptimized kprobe, b = boosted (single-step skipped), o = optimized kprobe�x86-64: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips�k = 0.99 usec; b = 0.43; o = 0.06;
void do_trap_break(struct pt_regs *regs)
{
#ifdef CONFIG_KPROBES
if (kprobe_single_step_handler(regs))
return;
if (kprobe_breakpoint_handler(regs))
return;
#endif
SS_slot:�<Replaced Instruction>�ebreak
Exception return
kprobe_breakpoint_handler()
{
kprobe_pre_handler()
single_step/simulate(Inst 2)
regs->pc = &SS_slot
}
Inst 0
Inst 1
Inst 2
Inst 3
Inst 4
Inst 0
Inst 1
ebreak
Inst 3
Inst 4
Kprobes Jump Optimization
Detour_buffer - optinsn_slot filled with optprobe_template
save_regs
optimized_callback(op, pt_regs)� restore_regs
direct exec Inst2 + Inst3
jump back to &Inst 4
Inst 0
Inst 1
ebreak
Inst 3
Inst 4
Inst 0
Inst 1
branch�...
Inst 4
X86 vs. Arm32 vs. Powerpc
X86
User
...
hole
Kernel
...
Module 1.5GB
Text 512MB
0
0x800000000000
0xffff800000000000
0xffffffffffffffff
0xffffffff80000000
0xffffffffa0000000
0xffffffffff000000
x86_64 mm layout
Arm32
B/BL support (-128M, 128M) offset ARM64 virtual address arrangement guarantees all kernel and module texts are within +/-128M. Why no arm64? Barry Song complains this issue:�https://www.spinics.net/lists/arm-kernel/msg828788.html�
User�(3GB - 14MB ...)
TEXT
modules (14MB)
Kernel
0
0xc0000000
Arm32 mm layout
...
0xffffffff
Powerpc
arch/powerpc/kernel/optprobes_head.S:�#define OPT_SLOT_SIZE 65536
.balign 4
/*
* Reserve an area to allocate slots for detour buffer.
* This is part of .text section (rather than vmalloc area)
* as this needs to be within 32MB of the probed address.
*/
.global optinsn_slot
optinsn_slot:
.space OPT_SLOT_SIZE
�arch/powerpc/kernel/optprobes.c:
static void *__ppc_alloc_insn_page(void)
{
if (insn_page_in_use)
return NULL;
insn_page_in_use = true;
return &optinsn_slot;��kernel/kprobes.c:�void __weak *alloc_insn_page(void)
{
return module_alloc(PAGE_SIZE);
}
Awards
Gold
Silver
Bronze
ARM32
X86
Powerpc
Puzzles of RISC-V & C-SKY (Similar to ftrace)
RV64GC:
C-SKY:
No proper Jump Instruction to use
Solution for rv64
Solution for csky32
Ref to arch/csky/kernel/ftrace.c:
Shadow Program Counter (SPC)
Classic 5-stages pipeline Processor
Shadow Program Counter (SPC)
SPC
Add
4
MUX
Predictors (almost the same with BTB)
lrw2spc (Introduce a new instruction)
Put lrwspc before replaced instructions, and get below benefits:
lrw2spc, imm -> spc = Mem[pc +/- imm]
�<Detour buffer>:
optprobe_template_restore_begin:
restore regs
optprobe_template_restore_orig_insn:
lrwspc, &constant_val� <replaced instructions>� jump back to origin exec_path�...�Constant_val:
64bit value
�
The Benefits
Two key points of hardware implementation
VOTE: Yes / No��From Linux kprobe & ftrace view, Is it valuable to be implemented?�
C.JALL (6 bytes trampoline solution)��For RISC-V/tech-code-size TG
Kprobes Trampoline Optimization
Detour_buffer - optinsn_slot filled with optprobe_template
save_regs
optimized_callback(op, pt_regs)� restore_regs
direct exec Inst2 + Inst3
jump back to &Inst 4
Inst 0
Inst 1
ebreak
Inst 3
Inst 4
Inst 0
Inst 1
branch�...
Inst 4
X86, arm32 trampoline solution in linux
RISC-V ftrace trampoline patch:
00000000000103fe <funca>:
103fe: 0001 nop
10400: 0001 nop
10402: 00010001 nop, nop
10406: 00010001 nop, nop
1040a: 0001 nop
1040c: 0001 nop
1040e: 1141 addi sp,sp,-16
10410: e422 sd s0,8(sp)
10412: 0800 addi s0,sp,16
10414: 8301a783 lw a5,-2000(gp) # 12030 <a>
10418: 2785 addiw a5,a5,1
1041a: 0007871b sext.w a4,a5
1041e: 82e1a823 sw a4,-2000(gp) # 12030 <a>
10422: 4781 li a5,0
10424: 853e mv a0,a5
10426: 6422 ld s0,8(sp)
10428: 0141 addi sp,sp,16
1042a: 8082 ret
00000000000103fe <funca>:
104fe: e406xxxx sd, ra, -8(sp)
10402: 00000097 auipc ra, 0x
10406: ed80xxxx jalr <ftrace_xxx_caller>
1040a: xxxxxxxx ld ra, -8(sp)
1040e: 1141 addi sp,sp,-16
10410: e422 sd s0,8(sp)
10412: 0800 addi s0,sp,16
10414: 8301a783 lw a5,-2000(gp) # 12030 <a>
10418: 2785 addiw a5,a5,1
1041a: 0007871b sext.w a4,a5
1041e: 82e1a823 sw a4,-2000(gp) # 12030 <a>
10422: 4781 li a5,0
10424: 853e mv a0,a5
10426: 6422 ld s0,8(sp)
10428: 0141 addi sp,sp,16
1042a: 8082
-fpatchable-function-entry, solution 2:
00000000000103fe <funca>:
103fe: 0001 nop
10400: 0001 nop
10402: 00010001 nop, nop
10406: 00010001 nop, nop
1040a: 0001 nop
1040c: 0001 nop
1040e: 1141 addi sp,sp,-16
10410: e422 sd s0,8(sp)
10412: 0800 addi s0,sp,16
10414: 8301a783 lw a5,-2000(gp) # 12030 <a>
10418: 2785 addiw a5,a5,1
1041a: 0007871b sext.w a4,a5
1041e: 82e1a823 sw a4,-2000(gp) # 12030 <a>
10422: 4781 li a5,0
10424: 853e mv a0,a5
10426: 6422 ld s0,8(sp)
10428: 0141 addi sp,sp,16
1042a: 8082 ret
00000000000103fe <funca>:
103fe: 0001 nop
104fe: xxxx mv, x9, ra
10402: 00000097 auipc ra, xxxxx
10406: ed80xxxx jalr <ftrace_xxx_caller>
1040a: 0001 nop
1040c: 0001 nop
1040e: 1141 addi sp,sp,-16
10410: e422 sd s0,8(sp)
10412: 0800 addi s0,sp,16
10414: 8301a783 lw a5,-2000(gp) # 12030 <a>
10418: 2785 addiw a5,a5,1
1041a: 0007871b sext.w a4,a5
1041e: 82e1a823 sw a4,-2000(gp) # 12030 <a>
10422: 4781 li a5,0
10424: 853e mv a0,a5
10426: 6422 ld s0,8(sp)
10428: 0141 addi sp,sp,16
1042a: 8082 ret
(similar to arm64 x9-x17)
C.JALL (2B_opcode + 4B_UIMM)
phy_reg = sign_ext(Memory[pc_offset(PC, 2)][31 + uimm:0])); JAL(phy_reg);
-fpatchable-function-entry, solution 3:
00000000000103fe <funca>:
103fe: 0001 nop
10400: 0001 nop
10402: 00010001 nop, nop
10406: 00010001 nop, nop
1040a: 0001 nop
1040c: 0001 nop
1040e: 1141 addi sp,sp,-16
10410: e422 sd s0,8(sp)
10412: 0800 addi s0,sp,16
10414: 8301a783 lw a5,-2000(gp) # 12030 <a>
10418: 2785 addiw a5,a5,1
1041a: 0007871b sext.w a4,a5
1041e: 82e1a823 sw a4,-2000(gp) # 12030 <a>
10422: 4781 li a5,0
10424: 853e mv a0,a5
10426: 6422 ld s0,8(sp)
10428: 0141 addi sp,sp,16
1042a: 8082 ret
00000000000103fe <funca>:
103fe: 0001 nop (push x9)
104fe: xxxx auipc, x9, 0
10402: xxxx c.jall <ftrace_xxx_caller>
10406: xxxxxxxx <ftrace_xxx_caller addr value>
1040a: 00010001 nop, nop
1040c: 0001 nop
1040e: 1141 addi sp,sp,-16
10410: e422 sd s0,8(sp)
10412: 0800 addi s0,sp,16
10414: 8301a783 lw a5,-2000(gp) # 12030 <a>
10418: 2785 addiw a5,a5,1
1041a: 0007871b sext.w a4,a5
1041e: 82e1a823 sw a4,-2000(gp) # 12030 <a>
10422: 4781 li a5,0
10424: 853e mv a0,a5
10426: 6422 ld s0,8(sp)
10428: 0141 addi sp,sp,16
1042a: 8082 ret
(similar to arm64 x9-x17)
Thank you
Testchip: 860*4 & 910*3 in it