mold kernel/embedded programming support plan

Rui Ueyama

2022-10-26

The mold linker currently lacks features for kernel or embedded programming. We want to support them. At the same time, we don't want to support the GNU ld's linker script language as it is overly complicated. This document explains what options we are planning to add to the linker as an alternative to linker script support.

Edit: All the options below have been implemented. (2022-10-27)

https://github.com/rui314/xv6-riscv/commit/de637da4e2aa6d71b2be258997412898e8a3e2eb

Proposed options

--section-order=<section-order-list>

This option enables users to specify the order of sections in memory and what addresses they will be at. The argument is a space-separated list of section specifiers. A section specifier is either a section name or a "#" followed by a section group name, followed by an optional address specifier. Here is an example of the option:

--section^[a]^[b]^[c]^[d]^[e]-order=".boot=0x10000 #text #rodata #data .bss=0x200000"

Sections specified by their names are placed in the same order as they are in the section specifier list. Sections that are not explicitly specified by name are grouped by "rodata" (read-only data section), "data" (read-write data section) and "text" (code section) which can be referred to by "#rodata", "#data" and "#text" special names, respectively. You can also refer to the ELF header and the program header with "#ehdr" and "#phdr", respectively.

(What about relro?)

You can optionally specify the start address of a section specifier in the notation of "=<address>". If no address is specified, the section is placed after the previous section in the section order list, just like what we do in the normal link.

If a section order is specified, the linker automatically creates program headers with appropriate types and attributes to cover all sections with the minimal number of program headers, just like we usually do in the normal link. (Do we need to give an ability to control the program header?)

It is not an error if a section specified by name does not exist. Such a section specifier is just ignored. However, it is an error if there's a section whose location cannot be decided by a given specifier list (i.e. you didn't use catch-all names, #rodata, #data or #text); the linker will report an error.

Here are the things you can't do with this option:

You can't control how input sections are mapped to output sections. Input sections are mapped to output sections with the same rule as the normal link. I don't think that causes a trouble: for example, if you want to place ".boot" at the beginning of ".text", you can just place ".boot" and ".text" next to each other in the section order list, instead of embedding ".boot" to the beginning of ".text". Sections are not run-time data structures; they exist only in executable files and aren't even mapped to memory at runtime. So no code should care whether a section is in another section or not.

You can't specify section alignment. If you want to specify an alignment, you should add "__attribute__((aligned(N)))", "alignas(N)" (C11/C++11) or ".align N" (most assembly) to your code^[f]^[g]^[h]^[i]^[j]^[k]^[l]^[m]^[n]^[o].^[p]
You can't define symbols. But I believe existing start/stop symbols combined with the following "--start-stop" option should work instead. You can also define aliases for giving better names. For example, if "boot" section is the first text section, "--defsym=__text_start=__boot_start" defines "__text_start" to point to the beginning of the text segment.
You can't control garbage collection with "KEEP()". If you want to keep a section from garbage collection, you can simply not enable garbage collection or specify "__attribute__((retain))" or the R flag in assembly (e.g. ".section,.boot,'axR'").
(Or, maybe we can just keep all sections whose names are explicitly specified by the --start-order option. Is it desirable?)
You can't discard sections with "/DISCARD/" unlike linker script. But I think that's a good thing.^[q]^[r]^[s]^[t]^[u]^[v]^[w] If something exists, the compiler created it for a reason, so just discarding it hides but not solves the problem. You should change the compiler flag or code so that the compiler doesn't create unwanted sections in the first place^[x]^[y]^[z]. If you really want to unconditionally remove a section, objcopy should work as a last resort.

--start-stop

Define "__start_<sectname>" and "__stop_<sectname>" symbols for all sections^[aa]^[ab]^[ac]^[ad]^[ae]. Section name is mangled by removing the leading "." and replacing characters other than [a-zA-Z0-9_] with "_" to construct the start/stop symbols. (Maybe we should also define "__<section-name>_size" for section size? Does anyone need it?)

Note that by default, linker defines such start/stop symbols only for sections whose name is valid as a C identifier (i.e. does not start with "."and does not contain any punctuation.)

--section-align=<sectname>=<value>

Align the start address of an output section specified by name to a given value. For example, "--section-align=.bss=4" aligns ".bss" to a 4 bytes boundary.^[af]^[ag]^[ah]^[ai]^[aj]

--physical-image-base=<addr>

ELF segments have two address fields: virtual address (vaddr) and physical address (paddr). In most cases, linkers and loaders use only vaddr; vaddr is the address of the section and paddr is just ignored. Some linkers like lld don't even bother to set a value to the paddr field.

However, there's one program that uses paddr instead of vaddr. That is the ROM writer for embedded devices. When the ROM writer copies segments from an ELF file to a device's ROM, it uses paddr instead of vaddr to know where to copy data. If paddr is equal to vaddr, it doesn't make any difference, but if they are different, segment data is copied to an address different from what it's supposed to be.

What's the point of doing this? Well, the point is, that mechanism allows us to represent segments that are initially in ROM but copied to RAM on startup. Embedded device's ROM is usually mapped to a fixed location in the address space. When the device is turned on, the CPU starts executing code in the ROM area. What the startup routine would do first is to copy its code and data from ROM to DRAM because ROM (EEPROM or Flash) is usually much slower than DRAM. In this configuration, paddr specifies the address before copy, and vaddr specifies the address after copy. Once the copy is complete, all segments are at their desired addresses (i.e. at their vaddrs), so everything will work just normally.

If --physical-image-base option is given, the first loadable segment's paddr is set to a given value, and non-overlapping consecutive physical addresses are given to the following loadable segments. You are supposed to specify the first address of the ROM area as an argument for this option.

If this option is given, we define "__phys_start_<sectname>" and "__phys_stop_<sectname>" to mark the beginning and end of the section's physical address.

Existing options useful for kernel/embedded programming

-z separate-loadable-segments

If you pass this option, all loadable segments are aligned to page boundaries.

--oformat=binary

If this option is given, the linker omits the ELF header, section header and segment header from the output. By placing the text section at the beginning of the file (using --section-order option), you can construct a header-less binary blob that can be directly loaded to a specific location of memory.

--omagic

This option allows the linker to place data and code in the same page. For embedded devices that don't have the notion of paging, you may want to use this option to place data and code without padding.

[a]Are we talking about sections or segments here?

[b]We are talking about sections.

[c]Interesting. The note about `#phdr` would tend to imply a segment-oriented view of the world, and I guess I'm mentally mapping what you refer to as "section groups" a la `#text` et al to loadable segments, but I think I get where you are coming from?

[d]`#phdr` refers to the unnamed memory-mapped chunk in an executable/shared object file that contains the program header. Even though ELF is segment-oriented at runtime, you generally can't think of how to layout segments until you fix section layout.

[e]Yes, but program headers describe segments (at least in the SysV ABI...) is all. :-)

[f]In many cases it's code like the crt0, and not the application, that requires section alignment -- for instance, it's quite common to require 4-byte alignment of things like BSS so that it can be zeroed with word-sized writes. This is true even if 100% of your variables in BSS are byte sized.

[g]Oh, another example -- many memory protection units require particular alignment rules for protection regions, so if some of my sections have separate protection attributes, I may need to control their alignment -- sometimes in ways you might not expect coming from a paged virtual memory background (which is an assumption on my part, I admit). For instance, requiring it to be naturally aligned to the next power of two greater than its size is common on ARM. ld's linker script language may be complex, but things like this are part of the motivation for its complexity -- it grew this feature circa 2013-14 iirc.

[h]Do you have any pointer to the ARM documentation that requires 2^n alignment?

[i]Armv7-M architecture reference, section B3.5.3: "The base address, size and attributes of a region are all configurable, with the general rule that all regions are

naturally aligned."

https://developer.arm.com/documentation/ddi0403/latest

Armv8-M removes this requirement but still has 32-byte alignment on start/end of MPU regions. (Section B10.1 of Armv8-M architecture reference)

[j]So we don't need to align a section to 2^n but to the natural alignment of a memory region. How can it be computed in GNU ld? It looks like in order to align a memory region to a natural boundary, we need to know the size of the memory region ahead of time. But when the size is available, GNU ld has already assigned the start address to the memory region, no?

[k]You're right, on v7-M there is usually some handcrafting of the linker script to make sure you end up with a valid MPU configuration. If you are setting alignments by hand then I think with mold you could just as well set the section start addresses by hand instead, though I don't want to speak for Cliff L. Biffle, since there are probably cases I don't know about.

For something like v8-M, it would be useful to have the 32-byte-alignment handled by the linker, because it's not practical to edit every single static variable declaration, and doing so would waste memory by adding internal padding to the section when it is only the start/end that matter. This applies also to non-loadable segments (.bss), so I'm not sure if the existing `-z separate-loadable-segments` covers it, even with configurable page size.

[l]We could add something like `--align-section=.foo=32` to align output section .foo to a 32 bytes boundary. How does it sound like?

[m]Sounds great to me, and would also cover the 4-byte alignment of .data/.bss. Is that alignment applied to both vaddr and paddr? For the .data load case you need both to be 4-byte aligned, for the MPU 32-byte case you probably just care about the vaddr.

[n]That's another question: how much should we respect section alignments for paddr? I originally thought that we should ignore alignments for paddr and just pack segments in the paddr space as tightly as possible, because many code just assumes that `_etext` is the beginning of the data to copy from ROM to RAM. But if we do not copy data byte-by-byte, we need to align it at least to a word size. We can unconditionally align segments to the word size in the vaddr space. But I'm not sure if it's enough.

[o]It would be useful that paddrs retain the same alignment as vaddrs so that they could be loaded at phys addresses and then just remapped to high values

[p]Instead of (or maybe in addition to) the code workarounds, perhaps refer to the, `--section-align` flag?

[q]I disagree. Compilers sometimes insert things that make sense for hosted environments (.got sections, PLTs, etc) that may not make sense for a statically linked standalone binary.

[r]Maybe, but can't you remove them by objcopy? .got and .plt are created by a linker, and I can say that when the linker created them, discarding them will almost always break the binary.

[s]I stumbled over non empty but zeroed .got .plt and vtable sections in a bootloader project. My current rather clumsy solution was to move those sections at the end, so that they do not affect any offsets -- and then discarding them with objcopy - it would be nice if there was an option to silence and ignore the missing section order error message when i only selectively specify an order for parts of TEXT and RODATA

[t]Are you using mold and asking that question? `.got` and `.plt` are created by the linker, so if they are created, there are references to the sections in your program. If there's no reference to external functions, mold at least doesn't create a `.plt` section.

[u]Yes, I am using mold-1.10.1 building mcuboot with cmake gcc-11. I mistyped there is no .plt* but two tiny sections called .got and .got.plt that contain only zeros. The vtable is more problematic as it also should not be there, and is also just zeros..

[v]We always create a `.got` because they are mandatory in some psABIs and also some types of relocations are defined as a relative address from `.got`. Without `.got`, we have no way to compute a value for such relocation. That's just a word or two length, so I hope that's not really a problem in practice.

As to the vtable, the linker doesn't generate a vtable, so it must have been copied from some input files to the output. Did you know where these vtables were copied from?

[w]Yes - I was confused that vtable occurs in c code, but just turns out that it was a very weird way to name a data array: https://github.com/machinaut/msp432-driverlib/blob/master/driverlib/MSP432P4xx/interrupt.c#L149

[x]FWIW, this is overly simplistic and will lead to trouble. It's not practical for most embedded devs to patch their compiler or add an entire new target just to e.g. avoid generating an exception handling table in C++ for a program where exceptions are disabled. To use one example. I think you'll wind up wanting a way to discard sections.

[y]I'm not sure I agree with that, adding a target to GCC is pretty easy. I don't think overloading more meanings on xxxx-elf is a good approach.

[z]objcopy should work as a last resort if you really want to unconditionally remove a section from an executable.

[aa]In many cases (e.g. writing the crt0) it's useful to also be able to get the paddr vs vaddr of sections. This typically isn't used for _all_ sections, but is useful for code that gets moved or data initialization images.

[ab]paddr and vaddr are not notions of sections but segments, so I wonder how crt0 uses paddr/vaddr. I wonder if you have a pointer to actual code. I want to learn.

[ac]Oh sorry, I think I misunderstood your comment. So you want to have a way to get the paddr of a section. But is this doable with the existing GNU ld?We can certainly define "__<section_name>_paddr_start" and "__<section_name>_paddr_end".

[ad]Yes you can provide symbols for paddrs in your GNU linker script either using the LOADADDR builtin (plus some arithmetic) or by relying on the vaddr of a previous section being at the paddr of the next section, which is the case if your .data init immediately follows your .text/.rodata in ROM.

For example this ld script provides symbols for vaddr(start), vaddr(end), and paddr(start) of some output section:

https://github.com/raspberrypi/pico-sdk/blob/2e6142b15b8a75c1227dd3edbe839193b2bf9041/src/rp2_common/pico_standard_link/memmap_default.ld#L182-L189

Those are referred to here in crt0:

https://github.com/raspberrypi/pico-sdk/blob/2e6142b15b8a75c1227dd3edbe839193b2bf9041/src/rp2_common/pico_standard_link/crt0.S#L297-L299

A word copy loop here copies from paddr (flash) to vaddr (RAM):

https://github.com/raspberrypi/pico-sdk/blob/2e6142b15b8a75c1227dd3edbe839193b2bf9041/src/rp2_common/pico_standard_link/crt0.S#L274-L280

[ae]Thanks! I added a sentence for "__<sectname>_paddr_start" and "__<sectname>_paddr_end".

[af]Ah, this would serve, but it begs the question: why give the start address with the section order flag, but alignment with a separate flag?

[ag]Good point. I just didn't want to add too many features to --section-order. Did you have any use case of specifying an alignment with --section-order?

[ah]Rather the inverse, I think: I could imagine a `--section-address` option that allows one to set a section's address independent of the order, in addition to the separate alignment flag.

To address the question more generally, however, I have worked in (at least) one kernel where we wanted to have the text, rodata, and data+bss aligned on 2MiB boundaries so that they could be mapped with 2MiB pages. We also had `etext`, `erodata`, `edata` and `end` symbols inserted by the linker at the end of the (aligned) .text, .rodata, .data. and .bss segments, respectively, and in that order. Here, I cared about the start address of .text, but the rest I didn't care too much _where_ they started (after all, .text may extend beyond the first 2MiB boundary; the linker knows how big it is, but I as the author don't necessarily), as long as they started on an aligned boundary after the previous segment.

So I could imagine specifying something where the .text segment starts at _some virtual address_ and everything else is specified with an alignment constraint after that.

[ai]For that particular case, you could pass `-z max-page-size=0x200000` and `-z separate-loadable-segments` so that the linker automatically aligns segments with different memory attributes to 2 MiB boundaries.

[aj]That's fine, until you want to use a different page size for a particular segment for whatever odd reason. I was thinking you may want to give --section-order and an address for the first (or second, or whatever) segment and then specify alignment with `--section-align` separately for the later segments.