rv64ilp32 - The future of 32-bit Linux
Guo Ren <guoren@kernel.org>
https://lwn.net/Articles/838807/
32-bit is waiting for its death.
32ilp32 v.s. 64ilp32 v.s. 64lp64
| 32ilp32�(Traditional 32-bit ABI) | 64ilp32�(New 32-bit ABI) | 64lp64 (Traditional 64-bit ABI) |
pointer | 32 | 32 | 64 |
ISA | 32 | 64 | 64 |
It is absolutely idiotic to have 64-bit pointers when I compile a program that uses less than 4 gigabytes of RAM. When such pointer values appear inside a struct, they not only waste half the memory, they effectively throw away half of the cache. – Knuth (2008)
https://www-cs-faculty.stanford.edu/~knuth/news08.html�A Flame About 64-bit Pointers
LP64 waste 25% memory
Test Environment:
ilp32 = (4096 - 3406) = 690 pages
lp64 = (4096 - 3231) = 865 pages
(865 - 690)/690 = 25%
sizeof(xxx) | ILP32 | LP64 |
struct page | 32 | 64 |
list_head | 8 | 16 |
hlist_node | 8 | 16 |
vm_area_struct … | 68 | 136 |
Unmatched
Why did they choose 64-bit ISA?
WHY?
32-bit ISA of Application Processor !?
32-bit mode - Throw away half of the Register
Registers
Cache
Memory
SSD/HD
MB~GB
GB~TB
TB~PB
Faster but costlier
Slower but cheaper
> 10us
80~140ns
1~40ns
0.2ns
Processor
ALU
Registers
memcpy/memload/memset performance in Linux kernel
lw/sw v.s. ld/sd
Does a pure 32-bit ISA make the chip area smaller?
ARM said: “Compared to Cortex-A35, the Cortex-A32 offers same 32-bit performance but consumes 10% less power and has a 13% smaller core.”
For RISC-V:
64-bit ISA is a visionary and wise choice!
WISE!
Our Solution: rv64ilp32�Run 32-bit pointer on RISC-V 64-bit ISA
The world’s first 64ilp32 ABI Linux kernel!
PATCH [01 - 11] u64ilp32
PATCH [12 - 36] s64ilp32
u64ilp32: User space support is similar to x86-x32, mips-n32, and arm64-ilp32.
s64ilp32 - The world’s first 64ilp32 ABI Linux kernel!
[RFC PATCH V2] rv64ilp32 patches:
s64ilp32
s64lp64
M-mode
opensbi
S-mode kernel
U-mode �userspace
ISA
m64lp64
s64lp64
u32ilp32 u64ilp32
s32ilp32
u32ilp32
m32ilp32
RV64
RV32
u32ilp32 u64ilp32 u64lp64
s64ilp32 v.s. s32ilp32
GENERIC_LIB_ASHLDI3�GENERIC_LIB_ASHRDI3 �GENERIC_LIB_LSHRDI3�GENERIC_LIB_UCMPDI2�…
Fedora 38 with “s64ilp32 + u32ilp32”
1800+ rv32 fedora packages
Next: [RFC PATCH V3] s64ilp32 + u64lp64
s64ilp32 + u64lp64 (2GB)
s64lp64 + u64lp64 (128TB)
Proof of Concept:
Next: [RFC PATCH V3] s64ilp32 + u64lp64
s64ilp32
s64lp64
M-mode
opensbi
S-mode kernel
U-mode �userspace
ISA
m64lp64
s64lp64
u32ilp32 u64ilp32 u64lp64
s32ilp32
u32ilp32
m32ilp32
RV64
RV32
=
Final Goal: s64ilp32 + u64ilp32
Reuse the 64-bit system call table, then delete the 32-bit ISA and its 32-bit system call table from Linux.
s64ilp32
s64lp64
M-mode
opensbi
S-mode kernel
U-mode �userspace
ISA
m64lp64
s64lp64
s32ilp32
u32ilp32
m32ilp32
RV64
RV32
u32ilp32 u64ilp32 u64lp64
=
rv64ilp32 ensures these chips succeed!
SUCCESS!
The future of 32-bit Linux - rv64ilp32
END�
Backup
Linux doesn’t like 32-bit ISA
eBPF JIT
ref: https://lore.kernel.org/netdev/20200220041608.30289-1-lukenels@cs.washington.edu/
Use native 64-bit ALU insns improve crypto algorithms
/*� * On some 32-bit architectures (h8300), GCC ends up using� * over 1 KB of stack if we inline the round calculation into the loop� * in keccakf(). On the other hand, on 64-bit architectures with plenty� * of [64-bit wide] general purpose registers, not inlining it severely� * hurts performance. So let's use 64-bitness as a heuristic to decide� * whether to inline or not.� */�#ifdef CONFIG_64BIT�#define SHA3_INLINE inline�#else�#define SHA3_INLINE noinline�#endif
/* update the state with given number of rounds */
static SHA3_INLINE void keccakf_round(u64 st[25])
{
u64 t[5], tt, bc[5];
/* Theta */
bc[0] = st[0] ^ st[5] ^ st[10] ^ st[15
Use native 64-bit load/store improve user space access
static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs)
{
…
#if CONFIG_64BIT
if (get_user(ptr, &t->rseq->rseq_cs))
return -EFAULT;
#else
if (copy_from_user(&ptr, &t->rseq->rseq_cs, sizeof(ptr)))
return -EFAULT;
#endif
Use native 64-bit atomic implement CMPXCHG_DOUBLE
mm/slub.c:
if (s->flags & __CMPXCHG_DOUBLE) {
ret = __update_freelist_fast(slab, freelist_old, counters_old,
freelist_new, counters_new);
} else {
ret = __update_freelist_slow(slab, freelist
Improvements
Sign-extend addressing
Traditional x86-x32, mips-n32, and arm64-ilp32 all use zero-extend addressing, and the compiler needs to insert additional zero-extend instructions, which causes code size and performance problems.�
So rv64ilp32 introduces a new solution called sign-extend addressing.
Stack size optimization
Traditional x86-x32, mips-n32, and arm64-ilp32 all use 64-bit for callee-saved registers, but they waste half of the stack size in ILP32 scenarios.
So rv64ilp32 prepares to use 32-bit stack layout for callee-saved registers.
Current Problems
y2038 problem
The traditional 32-bit Linux will stop working in 2038 when the 32-bit time_t overflows, which is historical ills.
For rv64ilp32:
GCC problem
https://github.com/Liaoshihua/RV64-ILP32/