1 of 65

Linux on RISC-V

Drew Fustini <dfustini@baylibre.com>

ELC-E Dublin 2022

2 of 65

$ whoami

3 of 65

RISC-V: a Free and Open ISA

  • Started by a computer architecture research group at University of California Berkeley in 2010 led by Krste Asanovic
  • V as in the roman numeral five, because it is the 5th RISC instruction set to come out of UC Berkeley
  • Free and Open because the specifications are published under an open source license: Creative Commons Attribution 4.0 International
    • Volume 1, Unprivileged Spec v. 20191213 [PDF]
    • Volume 2, Privileged Spec v. 20211203 [PDF]

4 of 65

What is different about RISC-V?

  • Simple clean-slate design
    • Avoids any dependencies on microarchitecture style (in-order, out-of-order, etc)
  • Modular design
    • Suitable for everything from microcontrollers to supercomputers
  • Stable base
    • Base integer ISAs and standard extensions are frozen
    • Additions via optional extensions, not new versions

(source: Instruction Sets Want to be Free, Krste Asanovic)

5 of 65

RISC-V base integer ISAs

  • RV32I: 32-bit
    • less than 50 instructions needed!
  • RV64I: 64-bit
    • Most important for Linux
  • RV128I: 128-bit
    • Future-proof address space

(source: RISC-V Summit 2019: State of the Union, Krste Asanovic)

6 of 65

RISC-V base integer registers

  • XLEN defines the register width
    • XLEN=32 for RV32I
    • XLEN=64 for RV64I
  • 32 registers named x0 to x31
  • Dedicated PC register
  • Base ISA talk by Andrew Waterman explains the instruction encoding scheme

7 of 65

RISC-V ABI

  • x1 to x31 are all equally general-use registers as far as the processor is concerned
  • RISC-V psABI defines standard functions for these registers [PDF]
    • s0 to s11 are preserved across function calls
    • argument registers a0 to a7 and the temporary registers t0 to t6 are not

8 of 65

RISC-V Standard Extensions

  • M: integer multiply/divide
  • A: atomic memory operations
  • F, D, Q: floating point, double-precision, quad-precision
  • G: “general purpose” ISA, equivalent to IMAFD
  • C: compressed instructions conserve memory and cache
  • Most Linux distributions target RV64GC

(source: RISC-V Summit 2019: State of the Union, Krste Asanovic)

9 of 65

Ratified in 2021

10 of 65

  • RISC-V is a highly modular and extensible architecture
    • Flexibility to pick and choose what is right for your processor design, but that flexibility creates a large number of possible combinations
  • RISC-V Profiles specify a much smaller set of ISA choices that represent the most common use-cases
    • RVM for microcontrollers intended to run bare-metal code or an RTOS
    • RVA for application processors designed to run full operating system like Linux
  • RISC-V Summit talk by Greg Favor [slides]

11 of 65

Learn more about RISC-V

  • Get up-to-speed quick with the RISC-V Reader

12 of 65

RISC-V International

13 of 65

  • Bi-weekly virtual meetup for the community to interact in real-time
    • Primary focus on RISC-V support in open source software and RISC-V dev boards
    • Call for participation is open! No prepared talk or slides required!
  • Schedule

14 of 65

“Is RISC-V an Open Source processor?”

  • RISC-V is a set of specifications under an open source license
  • RISC-V implementations can be open source or proprietary
  • Open specifications make open source implementations possible
  • An open ISA makes it possible to design an open source processor

15 of 65

RISC-V open source cores

  • Academia
    • Rocket and BOOM from Berkeley, PULP family of cores from ETH Zurich
  • Industry

16 of 65

RISC-V software ecosystem

  • RISC-V already has a well supported software ecosystem
  • Operating systems: Linux, BSDs, FreeRTOS, Zephyr
  • Toolchains & libraries: gcc, glibc, gdb, binutils, clang/llvm, newlib
  • Languages and Runtimes: V8, Node.js, Rust, Go, OpenJDK, Python

17 of 65

  • Three privilege modes
    • User (U-mode): application
    • Supervisor (S-mode): OS kernel
    • Machine (M-mode): firmware
  • Environment Call (ECALL) instruction
    • Transfer control to a higher privileged mode
    • Userspace program (u-mode) uses ECALL to make system call into OS kernel (s-mode)

18 of 65

Control and Status Registers (CSRs)

  • CSR have their own dedicated instructions to read and write
  • CSR are specific to a mode (e.g. m-mode and s-mode)
  • Machine Status (mstatus) is an important CSR

19 of 65

RISC-V Virtual Memory

  • satp CSR (Supervisor Address Translation and Protection) controls supervisor-mode address translation and protection
  • Sv32: 3 level page table
  • Sv39: 3 level page table
  • Sv48: 4 level page table
  • Sv57: 5 level page table

20 of 65

RISC-V Trap Handling

  • Exceptions occur synchronously
  • Interrupts occur asynchronously
  • <x>cause CSR indicates which interrupt or exception occurred
    • mcause for m-mode / scause for s-mode
  • Corresponding bit is set in <x>E/IP CSR

(source: RISC-V Privileged Architecture [PDF], Allen Baum)

21 of 65

What is a Hart?

  • Hart is a hardware thread
  • Each RISC-V core contains an independent instruction fetch unit
  • A RISC-V core with multi-threading (SMT) would contain multiple harts
  • Each hart is a processor from the perspective of Linux
    • Imagine a RISC-V laptop which has 2 cores with 2 harts per core
    • Linux would see 4 processors

22 of 65

RISC-V Interrupts

  • Local per-hart interrupts
    • CLINT (Core Local Interruptor)
  • Global interrupts
    • PLIC (Platform Level Interrupt Controller)

(source: RISC-V Fast Interrupts [PDF], Krste Asanovic)

23 of 65

  • Developed on the AIA SIG mailing list: tech-aia
  • APLIC (Advanced Platform-Level Interrupt Controller) replaces PLIC
  • Adds IMSIC (Incoming Message-Signaled Interrupt Controller) for PCIe
  • AIA is complimented by ACLINT (Advanced Core Local Interruptor)
    • Backwards compatible with the CLINT but restructured to be more efficient
    • RISC-V Summit talk by Anup Patel and John Hauser [slides]

24 of 65

RISC-V Boot Flow

Boot ROM

M-mode

First stage bootloader

(U-Boot SPL or vendor firmware)

U-Boot

S-mode

Linux

kernel

25 of 65

RISC-V Boot Flow

Boot ROM

M-mode

U-Boot

S-mode

Linux

kernel

SBI

First stage bootloader

(U-Boot SPL or vendor firmware)

26 of 65

  • Non-ISA RISC-V specification
    • This means it does not add or modify and RISC-V instructions
  • The calling convention between S-mode and M-mode
    • Allows supervisor-mode (s-mode) software like the Linux to be portable across RISC-V implementations by abstracting platform specific functionality

27 of 65

  • Originally required by the UNIX-Class Platform Specification
  • Small core along with a set of optional modular extensions
    • Base extension - query basic information about the machine
    • Timer extension - program the clock for the next event
    • IPI extension - send an inter-processor interrupt to harts defined in mask
    • RFENCE extension - instructs remote harts to execute FENCE.I instruction

28 of 65

SBI Extensions

  • Hart State Management (HSM)
    • S-mode can request to stop, start or suspend a hart
  • System Reset
    • Supervisor-mode software can request system-level reboot or shutdown
  • Performance Monitoring Unit
    • Interface for supervisor-mode to configure and use the RISC-V hardware performance counters with assistance from the machine-mode
    • ”Performance Monitoring in RISC-V using perf” by Atish Patra

29 of 65

Hypervisor extension

  • Hypervisor Supervisor mode (HS-mode) where host kernel runs
  • Virtualized Supervisor mode (VS-mode) where the guest kernel runs

30 of 65

  • Open source implementation of SBI
    • Core library
    • Platform specific libraries
    • Full reference firmware for some platforms
  • Provides runtime services to S-mode software
    • SBI extensions present on a platform define the available runtime services
    • Unimplemented instructions will trap and OpenSBI can emulate

(source: OpenSBI Deep Dive, Anup Patel)

31 of 65

  • No need to add code to OpenSBI for each new platform
    • First-stage bootloader, like U-Boot SPL, is expected to pass a Device Tree�to OpenSBI which describes all the platform specific functionality
  • The same OpenSBI binary can be used across platforms
    • Many RISC-V boards and emulators now use Generic Platform
    • Linux distros do not need to ship a different OpenSBI build for each board

32 of 65

UEFI Support

  • UEFI is a standard interface between firmware and operating systems, and it is used on most x86 and arm64 platforms
  • U-Boot and TianoCore EDK2 both have UEFI implementations on RISC-V
  • Grub2 can be used as an UEFI payload on RISC-V
  • UEFI support for RISC-V added in Linux 5.10

33 of 65

UEFI Support

  • Boot hart ID is known only at boot and it is needed before ACPI tables or DT properties can be parsed
  • Hart ID is passed in the a0 register on non-UEFI systems, but the UEFI application calling conventions do not allow this
  • RISC-V EFI Boot Protocol allows the OS to discover the boot hart ID
  • The public review process has completed, and Sunil V L has added support to the Linux kernel for RISCV_EFI_BOOT_PROTOCOL

34 of 65

  • Goal is to support “off-the-shelf” software by standardizing the interface between hardware platforms and operating systems
  • OS-A Platform
    • “A” as in application, this is a category of platforms that support full OS like Linux
    • OS-A Common Requirements
    • OS-A Embedded Platform
    • OS-A Server Platform
  • RVM-CSI Platform for RISC-V microcontrollers

35 of 65

  • OS-A common requirements for Embedded and Server
    • Must comply with the RVA22U and RVA22S ISA profiles as defined in RISC-V ISA Profiles
    • Common requirements for Debug, Timers, Interrupt Controllers
    • Requires serial console with UART 8250 or UART 16550
    • Requirements for runtime services such as SBI extensions
    • Software components must comply with the RISC-V Calling Convention specification and the RISC-V ELF specification

36 of 65

  • OS-A Embedded Platform
    • Target might be a single board computer or mobile device
    • PMU counters and events for performance monitoring
    • Boot process must comply with Embedded Base Boot Requirements (EBBR) spec
    • EBBR requires a subset of the UEFI spec which U-Boot has implemented
    • Device Tree (DT) is the required mechanism for the hardware discovery and config
    • GPT partitioning required for shared storage

37 of 65

  • OS-A Server Platform
    • Goal is for an enterprise Linux distro like RHEL to “just work” on server-class hardware that complies with this
    • System peripheral requirements like PCIe, watchdog timer, system date/time
    • RAS (Reliability, Availability, and Serviceability) requirements like ECC RAM
    • ACPI is the required mechanism for the hardware discovery and configuration
    • Must comply with the RISC-V ACPI Platform Requirements Specification

38 of 65

  • Defines mandatory ACPI tables and objects for RISC-V server platforms
  • New tables are needed for RISC-V
    • RISC-V Hart Capabilities Table (RHCT)
    • RISC-V Timer Description Table (RTDT)
  • More details in ‘ACPI for RISC-V: Enabling Server Class Platforms
    • Sunil V L from Ventana Microsystems at RISC-V Summit [slides]

39 of 65

RISC-V emulation in QEMU

  • Support for RISC-V in mainline QEMU
  • Boots 32-bit and 64-bit mainline Linux kernel
  • Machine configs to boot same binaries as some RISC-V dev boards
  • Supports the new Hypervisor and Vector extensions

40 of 65

RISC-V in the Linux kernel

  • Initial port by Palmer Dabbelt was merged into Linux 4.15 back in 2018
  • Palmer continues to maintain the riscv tree
  • Development happens on the linux-riscv mailing list
  • View the archive on lore.kernel.org
  • IRC: #riscv on libera.chat

41 of 65

Added in the past year

  • KVM RISC-V support by Anup Patel added in Linux 5.16
    • Add KVM support for the Hypervisor specification
  • SBI SRST extension support by Anup Patel added in Linux 5.17
    • Support for the SBI SRST (system reset) extension which allows systems that do not have an explicit driver in Linux to reboot

42 of 65

  • Add Sv57 page table support by Qinglin Pan
    • use 5-level page table to support Sv57 which expands the virtual address space to �57 bits (128 petabytes)
  • Improve RISC-V Perf support by Atish Patra

43 of 65

  • RISC-V CPU Idle support by Anup Patel
    • cpuidle and suspend drivers now support the SBI HSM extension
  • Provide framework for RISC-V ISA extensions by Atish Patra
    • Linux was no longer correctly parsing the RISC-V ISA string as the number of RISC-V extensions has grown and extension names are no longer a single character
    • This series implements a generic framework to parse multi-letter ISA extensions.
    • Based on initial work by Tsukasa OI

44 of 65

  • Support for page-based memory attributes
    • We’ll dive into that topic in a few slides...
  • Support for running rv32 binaries on rv64 systems via the compat subsystem. This is useful for systems with very limited RAM.
  • Support new generic ticket-based spinlocks, which allows us to also move to qrwlock
  • Support for kexec_file()

45 of 65

Upcoming Linux 6.0

  • Pull requests originally sent for 5.20 but Linus decided to call it 6.0
  • [PATCH v7 0/4] Add Sstc extension support by Atish Patra
    • Traditionally, an SBI call is necessary to generate timer interrupts as S-mode does not have access to the M-mode time compare registers.
    • This results in significant latency for the kernel to generate timer interrupts at kernel
    • For virtualized environments, it’s even worse as the KVM handles the SBI call and uses a software timer to emulate the time compare register.
    • Sstc extension allows kernel to program a timer and receive interrupt without supervisor execution environment (M-mode/HS mode) intervention.

46 of 65

Work in progress

  • [PATCH v10 00/16] riscv: Add vector ISA support by Greentime Hu
    • Vector ISA support based on the ratified Vector 1.0 extension
    • Defines new structure __riscv_v_state in struct thread_struct to save/restore the vector related registers. It is used for both kernel space and user space.
  • [PATCH v9 0/7] RISC-V IPI Improvements by Anup Patel
    • Traditionally, RISC-V S-mode software like the Linux kernel calls to into M-mode runtime firmware like OpenSBI to issue IPIs (inter-processor interrupts)
    • AIA (advanced interrupt architecture) provides the ability for S-mode to issue IPIs without any assistance from M-mode. This improves efficiency.

47 of 65

Linux distro: Fedora

  • Aims to provide a complete Fedora experience on RISC-V
  • Talk by Wei Fu, RISC-V Ambassador and Red Hat engineer [slides]
  • Installation instructions for QEMU and RISC-V dev boards

48 of 65

Linux distro: Debian

  • riscv64 is port of Debian to RISC-V
    • “a port in Debian terminology means to provide the software normally available in the Debian archive (over 20,000 source packages) ready to install and run”
  • 95% of packages are built for RISC-V
    • The Debian port uses RV64GC as the hardware baseline and the lp64d ABI

49 of 65

Linux distro: Ubuntu

  • riscv64 supported since the release of Ubuntu 20.04 LTS.
  • Ubuntu 22.04 pre-installed SD-card image for several boards & QEMU
  • Starting with Ubuntu 22.04, a server install image is made available to install Ubuntu on NVMe drive of the SiFive Unmatched board

50 of 65

Linux distros

  • OpenSuSE
    • RISC-V support is still under development with Tumbleweed images for some boards
  • Arch Linux
    • Community effort has 95% of core packages building for RISC-V
  • Gentoo
    • riscv64 stages are available on the Gentoo download page

51 of 65

OpenEmbedded and Yocto

  • meta-riscv: general hardware-specific BSP overlay for RISC-V devices
    • works with different OpenEmbedded/Yocto distributions and layer stacks
    • Supports both QEMU and RISC-V dev boards

52 of 65

BuildRoot

  • RISC-V port is now supported in the upstream BuildRoot project
  • “Embedded Linux from scratch in 45 minutes (on RISC-V)”
    • Tutorial by Michael Opdenacker at FOSDEM 2021
    • Use Buildroot to compile OpenSBI, U-Boot, Linux and BusyBox
    • Boot the system in QEMU

53 of 65

  • Mass production low cost SoC with a single T-Head C906 core at 1 GHz
  • Allwinner reused peripheral IP from their existing ARM SoCs and the Linux kernel already has drivers for most of them
  • However, T-Head MMU has a non-standard ‘enhanced’ mode that is needed to support DMA with devices on non-coherent interconnects
  • Linux needs to enable that ‘enhanced’ MMU mode to function

54 of 65

T-Head PTE format

  • T-Head used bits in the PTE (Page Table Entry) to specify memory type

� … but those bits were already marked reserved in RISC-V priv spec

| 63 | 62 | 61 | 60 | 59-8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0

SO C B SH RSW D A G U X W R V

^ ^ ^ ^

BIT(63): SO - Strong Order

BIT(62): C - Cacheable

BIT(61): B - Bufferable

BIT(60): SH - Shareable��0000 - NC Weakly-ordered, Non-cacheable, Non-bufferable, Non-shareable

0111 - PMA Weakly-ordered, Cacheable, Bufferable, Shareable

1000 - IO Strongly-ordered, Non-cacheable, Non-bufferable, Non-shareable

55 of 65

Page-Based Memory Types extension

  • Svpbmt proposed by Virtual Memory TG and ratified at the end of 2021
    • “S” = supervisor-mode (privileged architecture), “v” for virtual memory

Here is the svpbmt PTE format:

| 63 | 62-61 | 60-8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0

N MT RSW D A G U X W R V

^

RISC-V

Encoding &

MemType RISC-V Description

---------- ------------------------------------------------

00 - PMA Normal Cacheable, No change to implied PMA memory type

01 - NC Non-cacheable, idempotent, weakly-ordered Main Memory

10 - IO Non-cacheable, non-idempotent, strongly-ordered I/O memory

11 - Rsvd Reserved for future standard use

56 of 65

Svpbmt support in Linux

  • [PATCH v10 00/12] riscv: support for Svpbmt and D1 memory types
    • by Heiko Stuebner based on initial work by Alibaba kernel engineer Guo Ren
    • Implements the official RISC-V Svpbmt extension
    • The standard Svpbmt and custom T-Head PTE formats both use the highest bits to determine memory type but the encoding and semantics are different
    • T-Head PTE format is supported through boot-time instruction patching using the Linux Alternatives Framework. Heiko presented on this topic at Embedded World 2022.
    • Patch series was merged and landed in Linux 5.19 release

57 of 65

  • Instructions to manage cache are important for SoCs which that lack cache coherent interconnects
  • Zicbom extension (“Z” prefix means Unpriv spec) was ratified at the end of 2021, and it defines cache-block management (CBO) instructions:
    • CBO.CLEAN guarantee store by hart can be read from mem by non-coherent device
    • CBO.INVAL guarantee hart can load data written to memory by non-coherent device
    • CBO.FLUSH guarantees both
  • Support for Non-Coherent I/O Devices in RISC-V from RV Summit [slides]

58 of 65

CMO support in Linux

* cbo.clean rs1

* | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |

* 0...01 rs1 010 00000 0001111

*

* cbo.flush rs1

* | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |

* 0...10 rs1 010 00000 0001111

*

* cbo.inval rs1

* | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |

* 0...00 rs1 010 00000 0001111

#define CBO_INVAL_A0 ".long 0x15200F"

#define CBO_CLEAN_A0 ".long 0x25200F"

#define CBO_FLUSH_A0 ".long 0x05200F"

59 of 65

CMO support in Linux

  • T-Head implemented custom cache instructions before Zicbom existed

* dcache.ipa rs1 (invalidate, physical address)

* | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |

* 0000001 01010 rs1 000 00000 0001011

* dache.iva rs1 (invalida, virtual address)

* 0000001 00110 rs1 000 00000 0001011

*

* dcache.cpa rs1 (clean, physical address)

* | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |

* 0000001 01001 rs1 000 00000 0001011

* dcache.cva rs1 (clean, virtual address)

* 0000001 00100 rs1 000 00000 0001011

*

* dcache.cipa rs1 (clean then invalidate, physical address)

* | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |

* 0000001 01011 rs1 000 00000 0001011

* dcache.civa rs1 (... virtual address)

* 0000001 00111 rs1 000 00000 0001011

#define THEAD_INVAL_A0 ".long 0x0265000b"

#define THEAD_CLEAN_A0 ".long 0x0245000b"

#define THEAD_FLUSH_A0 ".long 0x0275000b"

60 of 65

CMO support in Linux

  • While the Zicbom and T-Head instructions are different, they provide the same functionality, so the T-Head variant handled with the existing alternatives mechanism
  • Allwinner D1 needs these cache instructions for peripherals like�MMC (SD card), USB, and Ethernet to work
  • Support will be in the Linux 6.0 release as Heiko’s patch series was included in Palmer’s 5.20* PR to Linus

61 of 65

  • The RISC-V instruction set architecture is developed in the open: in-progress drafts are available for all to review and to experiment with implementations. New module or extension drafts can change during the development process - sometimes in ways that are incompatible with previous drafts.
  • This flexibility can present a challenge for RISC-V Linux maintenance.
  • General rule: only frozen or ratified extensions will be supported in Linux
  • This policy will be updated to be more specified after Linux Plumbers

62 of 65

  • During Linux Plumbers Conference earlier this week
  • Live stream (individual videos of talks posted later)

63 of 65

  • Initiative from RISC-V International to get Linux-capable boards into the hands of open source developers
    • Launched in 2021 with the Allwinner D1 Nezha and SiFive Unmatched
  • Fill out this form to apply
    • Need to be RISC-V International member (or part of a member organization),�but remember that individuals can join RISC-V International free of cost
    • Explain why you are interested in RISC-V and what you plan to do with dev board
    • To improve your chances, don’t overestimate your hardware requirements like RAM

64 of 65

No hardware? Try Renode!

  • Emulate physical hardware systems including CPU, peripherals, sensors, and networking
  • Run the same binaries as the real hardware �for over 30 supported dev boards

65 of 65

Let’s talk more at the

RISC-V and Open Hardware BoF

at 3:55 PM this afternoon

Liffey Meeting Room 2 (Level 1)

Drew Fustini <dfustini@baylibre.com>