Go 1.3 Native Client Support

Russ Cox

October 2013

Abstract

Go 1.3 will include support for running command-line programs under Native Client, Google’s SFI-based execution sandbox.

Background

Native Client (NaCl) is a restricted execution environment for x86 binaries. The most notable use of NaCl is to allow sandboxed execution of compiled binaries in Google Chrome extensions. However, NaCl also comes with a tool for executing command-line binaries in a sandboxed environment. Go 1.3 targets this command-line tool (sel_ldr_x86_64 or sel_ldr_x86_32), to be used on play.golang.org, so that users can be given access to package unsafe, real threads, and a generally richer execution environment. Support for Google Chrome is not planned but could be done as a follow-up.

NaCl provides a sandbox for each of three architectures: 32-bit x86, 64-bit x86, and 32-bit ARM. The details of the sandbox are different on each architecture; that architecture-specific effort constitutes the bulk of the work. There are no immediate plans to support ARM, although we do not believe it would be too hard.

NaCl also provides a “portable” file format, PNaCl, based on LLVM bytecode. When asked to execute a PNaCl file, NaCl first compiles it to one of the three target architectures and then applies the usual sandboxing. There is no standalone specification for the PNaCl file format, and the file format appears to be much more complex than the sum of the changes required to support each architecture individually, so the Go support will target individual architectures, not PNaCl.

 

Implementation

The core requirement of NaCl is to arrange the instructions in the final executable in a certain, verifiable way. The general implementation strategy is to assign this job to the linker (6l or 8l), so that most of the toolchain can be used unchanged.

This strategy breaks down for 64-bit x86, where NaCl defines that pointers are only 32 bits wide, disallows the use of register R15, and disallows the use of multi-register addressing modes. The compilers and the runtime must be adjusted to respect these restrictions. The use of 32-bit pointers in particular is such a significant change that almost none of the existing amd64 assembly is correct for NaCl (the exception is assembly that does not use any pointers, found only in package math). Primarily for this reason, we define a new GOARCH value for 64-bit NaCl: amd64p32, denoting amd64 with 32-bit pointers. If Go ever needs to be ported to other so called “ILP32” systems, the same GOARCH can be reused in those contexts.

Because pointers are 32 bits wide, we define that the Go int type is also 32 bits wide: there is not much point in having a slice’s len(x) and cap(x) be 64 bits wide when the base array cannot have more than 232 elements.

At a higher level, NaCl is logically a new operating system providing its own execution environment, so all the work required of a new operating system — system call, memory allocation, thread creation, signal handling, and os on — must be done. NaCl will use GOOS set to nacl.

NaCl is a restricted execution environment. By design it does not provide access to many common operating features, such as pipes, networking, or a file system. In order to limit the changes needed in higher-level packages, the Go implementation of package syscall will implement a Unix-like simulation of these features. These simulations will also make it possible to use these features on the Go playground. For example, it will be possible to run a program that starts an HTTP server and then connects to it, or to write programs that read from or write to the file system.

The sections that follow discuss the implementation in detail, starting with the NaCl requirements.

NaCl Restrictions: x86-32

NaCl defines certain restrictions on the binaries being executed, mainly having to do with code layout. The restrictions allow NaCl to ensure that the code cannot break out of the sandbox.

On x86-32, the code segment must start at address 0x20000 and contain only instructions. There must be at least 32 unmapped bytes between the code segment and the segment that follows (read-only data or read-write data). An instruction-by-instruction disassembly proceeding linearly (that is, not following jumps) through the code segment must find only allowed instructions, with no instruction spanning a 32-byte boundary. All direct jumps must be to instruction boundaries encountered during the linear scan. All indirect jumps must be to 32-byte-aligned addresses, implemented as the x86 instruction pair

        ANDL $~31, reg

        JMP reg

but that pair is considered a single instruction for the validation (the pair must not cross a 32-byte boundary, and direct jumps cannot skip over the ANDL instruction). Similarly, an indirect CALL must be implemented as:

        ANDL $~31, reg

        CALL reg

and a return must be implemented by popping the return address off the stack (typically using POPL reg) and then executing an indirect jump to that return address. In order for that sequence to return to a correct address, the mask must not actually change the return address, meaning that all return addresses must be 32-byte aligned, meaning that every CALL instruction must end at a 32-byte boundary.

Thread-local storage can be used by reference to the TLS block pointer at %gs:0. Like on Linux, %gs:0 actually contains a pointer just past the thread-local storage, so Go’s two thread-local pointers g and m are -8(p) and -4(p), where p is the pointer stored at %gs:0.

Privileged instructions are disallowed, as are instructions that interact with the kernel or with hardware, such as INT, SYSCALL, and port I/O instructions. The segment registers are used to limit memory accesses, so direct access to the segment registers and segment overrides other than %gs (for thread-local storage) are disallowed.

For the most part, these restrictions can be satisfied by suitable code layout choices in the linker, by the strategic insertion of no-ops and by rewriting instructions such as JMP and CALL into appropriate instruction sequences.

NaCl Restrictions: x86-64

On x86-64, the restrictions are more complex. The code segment must still start at address 0x20000 and contain only instructions, with 32 unmapped byte before the next segment. An instruction-by-instruction disassembly proceeding linearly must still find only allowed instructions, with no instruction spanning a 32-byte boundary, direct jumps only to instruction boundaries, and indirect jumps to 32-byte-aligned addresses. However, there are significant changes around the form of memory references. Specifically, the 4GB address space referred to by a 32-bit pointer does not start at virtual address 0. Instead, it is only 4GB-aligned, such as the range 0x123400000000 to 0x1234FFFFFFFF. The 64-bit base of this space is stored in register R15, which no instruction can modify. To use a register reg as a pointer, the register must have its top 32 bits cleared and then the addressing mode must add R15 (the base address of the 4GB range) to the effective address. For example, this instruction sequence reads two fields from a struct:

        MOVQ        0(AX), BX

        MOVQ 8(AX), CX

The corresponding instruction sequence on x86-64 NaCl is:

        MOVL AX, AX

        MOVQ        0(R15)(AX*1), BX

MOVL AX, AX
        MOVQ 8(R15)(AX*1), CX

where each pair is a single instruction for boundary validation. NaCl claims to allow elision of the truncating MOVL if the MOVL can be proved redundant within the current 32-byte instruction block, but we have not attempted to take advantage of this.

The insertion of R15 into memory references means that the compilers must avoid complex addressing such as 0(AX)(BX*2); the slot used by AX in that form is needed for R15.

Instructions that use pointers implicitly, such as the string operations MOVS, SCAS, and STOS, must also be prefixed with pointer adjustments. For example, the NaCl encoding of MOVSB is the five ordinary instructions:

        MOVL SI, SI

        LEAQ (R15)(SI*1), SI

        MOVL DI, DI

        LEAQ (R15)(DI*1), DI

        MOVSB

To reduce the cost of memory references via BP and SP, NaCl requires that the BP and SP registers already contain R15 in their top bits. Reads and writes to memory using SP or BP are not rewritten to mention R15 and do not need the truncating MOVL. Instead, manipulation of BP and SP must ensure that they always have the correct form. Specifically, any store into or modification of SP must be a 32-bit instruction followed by “ADDQ R15, SP”, and that pair is treated as a single instruction for boundary validation. Because both BP and SP already contain R15, simple MOVQ BP, SP or MOVQ SP, BP is allowed without adjustment.

Indirect jumps and calls are similar to memory references in that they must add R15 to the register. The specific sequence required is thus one instruction longer than for x86-32 NaCl:

        ANDL $~31, reg

        ADDQ R15, reg

        JMP (or CALL) reg

Although pointers are defined to be 32 bits wide, this is only a convention agreed upon by the NaCl host system and the binary being executed. The hardware has no idea of this agreement, so the return address pushed onto the stack by CALL and later popped by the simulated return instruction is 64 bits wide. The simulated return must execute a 64-bit POPQ instruction, not a 32-bit POPL (also, there is no POPL in 64-bit mode).

On x86-64, thread-local storage is accessible only via calls to system functions; making direct use of the %gs segment register is not allowed.

Build System

As mentioned above, the use of 32-bit pointers on a 64-bit machine is sufficiently different from the standard GOARCH=amd64 to warrant a new architecture name. We use GOARCH=amd64p32 for this variant. We define GOOS=nacl for the NaCl “operating system”. Both cmd/dist and cmd/go need to know that amd64p32 is now a valid GOARCH name and that nacl is now a valid GOOS name, so that they understand that a file like asm_amd64p32.s is restricted to use on GOARCH=amd64p32 systems.

Although we are using a new GOARCH value, we are not adding new compilers: the architecture letter is still 6, so 6l, 6a, 6c, and 6g will need to check the GOARCH setting to understand whether they are being invoked to generate amd64 or amd64p32 code.

An alternative would be to make x86-64 NaCl use GOOS=nacl GOARCH=amd64, with the GOOS=nacl serving to identify the use of 32-bit pointers. Within the compiler toolchain neither has compelling benefits over the other. However, when we get to building packages, nearly all *_amd64.s assembly files assume in some way that pointers and Go int values are 64 bits wide, making them incorrect on x86-64 NaCl. Using a different GOARCH name keeps the build system from using those files, sidestepping bugs as well as the need to add “// +build !nacl” to every such file.

Linker

The overall goal is to confine as many code generation changes to the linker as possible.

In 8l, the changes in the linker are almost exclusively limited to code layout in span.c, and they are minimal. An instruction crossing a 32-byte boundary must have no-ops inserted before it, to bump it to start on the next boundary. A CALL instruction not ending at a 32-byte boundary must also have no-ops inserted before it, to bump it to end at the next boundary. By convention, 8l treats prefixes REP, REPN, and LOCK as separate instructions, in a sequence like “REP; MOVSL” or “LOCK; XCHG”. 8l must take care not to insert no-ops before the prefix instruction, not between the prefix and the instruction that follows.

The x86 architecture has a variety of possible no-op instructions. http://www.agner.org/optimize/optimizing_assembly.pdf has a useful table. For the purposes of padding to 32-byte boundaries, 8l uses (hexadecimal bytes):

        90

        66 90

        0F 1F 00

        0F 1F 40 00

        0F 1F 44 00 00

        66 0F 1F 44 00 00

        0F 1F 80 00 00 00 00

        0F 1F 84 00 00 00 00 00

        66 0F 1F 84 00 00 00 00 00

        66 66 0F 1F 84 00 00 00 00 00

The 10-byte form is suggested by the linked PDF but rejected by NaCl, so we do not use it.

In addition to the no-op insertion, 8l treats incoming indirect CALL, indirect JMP, and RET instructions as pseudo-ops that it rewrites to NaCl sequences. Specifically, it uses:

CALL reg =>

ANDL $~31, reg

CALL reg

JMP reg =>

ANDL $~31, reg

CALL reg

RET =>

        POPL SI

        ANDL $~31, SI

        CALL SI

NaCl disallows indirect CALL and JMP using memory references. 8l does not rewrite them, so assembly authors and compilers must not use them. 8l will create a binary containing the instructions, and then NaCl will reject it. On a related note, some low-level assembly contains the “INT $3” breakpoint instruction. NaCl rejects this instruction, so 8l rewrites it to the more acceptable “HLT” instruction.

There is one very special case in the code layout. The implementation of deferred functions is that a function F containing a defer statement inserts “CALL deferreturn” before each “RET” instruction. That is, each return looks like:

        CALL deferreturn

RET

If deferreturn finds no deferred work, it behaves like a no-op and returns to F, causing the following RET instruction to execute, making F return. If deferreturn does find deferred work W, it takes the first function to be run off the list, rewrites the stack as if F called W, and then begins execution of W. In order to handle the possibility of more than one deferred function, deferreturn subtracts from the return address on the stack, so that instead of pointing at the RET, it points at the CALL deferreturn. When W returns, F will call deferreturn again, in effect implementing a loop over all the deferred work. The implementation of this adjustment in deferreturn is to subtract 5 (the width of a CALL instruction) from the return address. In NaCl, because of the masking, subtracting 5 from a return address is not possible: subtracting 5 is tantamount to subtracting 32. In order to make the subtraction implementation work, the CALL deferreturn must be in its own 32-byte block that contains 27 bytes of no-ops followed by the CALL instruction. When deferreturn subtracts 5, the eventual return to that address will mask off the 27 in the low bits, backing up to the beginning of the no-ops. The return will re-execute those no-ops and then CALL deferreturn as needed.


In 6l, the translation must perform all the rewrites used in 8l and described above. The rest of this section describes only modifications specific to 6l.

6l rewrites:

CALL reg =>

ANDL $~31, reg

CALL reg

JMP reg =>

ANDL $~31, reg

CALL reg

RET =>

POPL SI

ANDL $~31, SI

CALL SI

INT $x =>

HLT

SCASB (or SCAS[WLQ], STOS[BWLQ]) =>

MOVL DI, DI

LEAQ (R15)(DI*1), DI

SCASB
MOVSB (or MOVS[WLQ]) =>

MOVL SI, SI

LEAQ (R15)(SI*1), SI

MOVL DI, DI

LEAQ (R15)(DI*1), DI

MOVSB

ANY* xxx, SP =>

ANY xxx, SP

ADDQ R15, SP

In the last form, ANY denotes any 32-bit-sized instruction writing to SP, so any instruction other than CMPL, CMPQ, TESTL, and TESTQ.

In addition to these per-instruction rewrites, 6l rewrites any instruction with an argument v(reg*n) referencing memory (that is, any instruction with an argument of that form except LEAL, LEAQ), to use the argument v(R15)(reg*n) and to insert a truncating “MOVL reg, reg” ahead of the main instruction.

Go relies heavily on thread-local storage, to hold the per-goroutine and per-thread pointers g and m. Nearly every function refers to g on entry, to check whether it is time to split the stack. The linker is responsible for inserting these checks. The x86-32 NaCl did not change the thread-local storage model from the standard x86-32 execution, but the x86-64 NaCl does. Specifically, it requires calling a function to obtain thread-local values. It is simply not reasonable to add that cost to every Go function, so we must find a different way to store thread-local values. In 6l, we use the BP register as the thread-local storage base: g is 0(BP) and m is 4(BP). BP is attractive because (1) the Go toolchain uses it as an ordinary 64-bit register, not as a base pointer; (2) NaCl requires BP to hold a valid pointer, so it cannot be used as an ordinary 64-bit register; (3) becuase BP is a valid pointer, it is exempt from the R15 manipulation, speeding up accesses using it; and (4) NaCl treats BP as callee-save, so no special arrangement is required to preserve it across calls into the NaCl system routines. If Go on x86-64 NaCl didn’t use BP for thread-local storage, it wouldn’t use it at all. So the use as a thread-local storage base is essentially free.

Both 386 and amd64 assembly refer to thread-local storage using the notation n(GS); GS is the typical segment register defining the thread-local storage base. The linker rewrites n(GS) into the appropriate reference for the given operating system, typically adjusting the constant n. To preserve the abstraction on NaCl, 6l interprets references to GS as really referring to BP. Similarly, 6l rejects any direct references to BP (that is, using the name “BP”), since those are more likely buggy code that does not know BP is not a valid standard 64-bit register. For the same reason, 6l also rejects any direct references to R15.

The 6l linker is the first of many parts of the system that assume that register size and pointer size are the same. For that matter, it is the first of many parts to assume that pointer size is a constant feature of a particular target system. The linker constant PtrSize must be made a variable, with a new constant RegSize denoting the register width. All code referring to PtrSize must be inspected and revised to use RegSize when appropriate. In general, the only code that should use PtrSize is code reading or writing Go data structures, such as the preparation of the symbol table or the garbage collector tables, the code that parses the reflect.Type data structures to generate DWARF debug information, and the code implementing the -X option to define a string value at link time.

Noteworthy for its absence: the definition of the object code in 8.out.h and 6.out.h requires no changes.

Assembler

The assemblers change to record the current GOARCH value in the binary, instead of hard-coding, say, “amd64”.


C Compiler

The compilers change to record the current GOARCH value in the binary, instead of hard-coding, say, “amd64”.

8c requires no other changes.

6c must change behavior in a few ways.

When GOOS=nacl, 6c must not use the BP and R15 registers, it must not generate complex memory references using two registers, like 8(AX)(BX*4), and it must not generate direct memory references to addresses larger than 2 GB (must assume “large model”). These are all trivial changes.

When GOARCH=amd64p32, 6c must treat pointers as 32-bit values, using MOVL instead of MOVQ, and so on. Many places in the code use 8 as the size of a pointer. They need to be changed to use ewidth[TIND].

A larger question concerns alignment of function arguments passed on the stack. The least disruptive choice seems to be to treat 32-bit pointers the same as 32-bit integers, meaning that given these definitions:

void f(uint32 x, uint64 y);

void f1(void *x, uint64 y);

void g(uint32 x, uint32 y);

void g1(uint32 x, void *y);

the functions f and f1 have the same argument frame layout (with 4 bytes of padding between x and y), and so do g and g1 (with no padding between x and y). This has the unwelcome implication that “pointer alignment” is no longer the strictest kind of alignment. But things get worse when we consider structs. The Go compilers align structs to the maximum alignment of any field in the struct, while the C compilers always align structs to a fixed alignment, the word size. In practice, the structs shared between the two worlds contain pointers, so until now these two definitions have produced the same result. To keep them matching, however, the C compilers must be changed to align structs to the maximum alignment of any field in the struct, not to the word size. For example:

        void h(uint32 x, String y);

The function h passes a uint32 and, within the String, a pointer and a Go int. The old C rules would word-align the String, inserting 4 bytes of padding before y, while the Go rules do not. The C compilers must be changed to match the Go rules here.

Another unfortunate implementation concerns variadic functions. Consider:

        void printf(const char *fmt, …);

The C compiler treats calls to variadic functions the same as a call to a function expecting the particular argument list, meaning that printf(“%d”, 1) will insert no padding, but printf(“%lld”, (uint64)1) will need to insert 4 bytes of padding between the format and the count. On systems where the pointer size is the maximum alignment, there is never a need for padding after a single pointer argument, so this situation does not arise.

There is no change required in the C compiler for variadic functions like printf, but we will need to keep that complication in mind in the runtime implementation of such functions.

Go Compiler

The compilers change to record the current GOARCH value in the binary, instead of hard-coding, say, “amd64”, and to use that value when resolving imports.

NaCl provides a basic signal handling capability, but it only relays memory faults, not division by zero. Instead, division by zero always terminates the program. Since division by zero must cause a panic in Go, when GOOS=nacl the Go compilers must insert the logical equivalent of

if d == 0 { runtime.panicdivide() }

before each division by non-constant d. The generated code on all operating systems already includes a check for d == -1, to avoid a fault when n is the most negative number (n == -n).

8g requires no other changes.

6g requires fewer changes than 6c.

When GOOS=nacl, 6g must not use the BP and R15 registers, and large model code generation must be enabled. 6g does not generate multiple-register addressing modes, so that mode need not be disabled.

When GOOS=amd64p32, 6g must treat pointers and ints as 32-bit values. The backend code was written with the possibility of 32-bit pointers in mind, so most of it uses widthptr instead of a hard-coded 8, and the simtype array is used to map pointer and unsized int types into sized integer equivalents. There are a few lines hard-coding the use of MOVQ, ADDQ, LEAQ, and STOSQ when manipulating pointers. Those need to be changed to use size-specific instructions instead.

Not a change, but worth noting: the Go compilers lay out a function’s input and output parameter lists as two separate word-aligned structs. For example:

func f1(x byte) (y byte)

func f2(x uint16) (y uint16)

func f3(x uint32) (y uint32)

func f4(x *byte) (y *byte)

On all 64-bit systems, f1, f2, and f3 insert padding between x and y, in order to align y to a 64-bit boundary. When GOOS=amd64p32, f4 also inserts padding between x and y. This has the unfortunate effect of breaking compatibility with C implementation like:

void f4(byte *x, byte *y) {

y = x;

FLUSH(&y);

        }

The incompatibility between Go and C here requires changes to the runtime. (There is no obvious way to fix the problem in the C compiler, short of special annotations to tell the C compiler where the Go output arguments begin.)

Package runtime

To the extent that x86-64 NaCl is a new architecture, package runtime requires the usual per-architecture assembly and C code. The initial process bootstrap must be written along with functions like memmove and memclr. The x86 stack trace routines, which are shared already between 386 and amd64, require minimal adjustments to account for the fact that each program counter saved on the stack is 64 bits even though other pointers are 32 bits.

To the extent that NaCl is a new operating system, it requires the usual per-operating system assembly and C code to handle operations like allocating memory, creating threads, printing to standard error, installing a trap handler, responding to traps, restarting after traps, and exiting the program.

Most of the operating system-like functionality is straightforward, except that the “system call” interface that Go uses is deprecated. For use in a sandboxed environment where we control the specific version of NaCl being used, the deprecation is not a problem, but an appendix below describes how the implementation might be changed to use the new NaCl IRT.

Perhaps the most significant feature of NaCl as an operating system is that it does not provide a “sigreturn” system call to return from a signal handler. There is a system call to mark the signal handler as having completed, so that the handler can be invoked for future traps, but that call does not restore the execution context. The omission is understandable: in general the execution context is difficult to restore safely, and most systems do not continue execution after memory faults. Luckily, Go only continues execution by setting up a call to a function that will not return (runtime.sigpanic), and it is possible to restore enough state to start that function without help from NaCl.

Both of these sources of changes are fairly limited, much less work than a typical new architecture or new operating system. However, the incompatibility between C and Go frames, mentioned in the previous section, requires much more significant and invasive changes. To take a real example, the builtin function copy compiles into a call to runtime.copy, with the Go signature:

        func copy(dst, src []any, width uintptr) (n int)

The corresponding C function in the runtime package (with a simplified implementation) is:

        void copy(Slice dst, Slice src, uintptr width, intgo n) {

                n = dst.len;

                if(n > src.len)

                        n = src.len;

                memmove(dst.array, src.array, n*width);

                FLUSH(&n);

        }

The FLUSH macro ensures that n is written back to the parameter memory, where the Go caller expects to find the function’s output. Because width is pointer-sized and output results start word-aligned and words and pointers have the same size, the C and Go prototypes match on current systems. On GOARCH=amd64p32, however, the input arguments are 7 uint32s, or 3½ words, meaning that the C parameter intgo n is not in the correct place. Short of introducing new notation in the C compiler, padding must be inserted before n.

We already have a tool to help with this. Ian’s goc2c, now part of cmd/dist, reads .goc files that define Go function signatures with C bodies and then writes out equivalent C code, inserting padding as appropriate to the target architecture. The required change is to use goc2c in many more places than we do today. Specifically, we’ve implicitly assumed that it was unnecessary to use .goc files for functions with a final pointer-sized input argument or with a leading pointer-sized output argument. That is no longer the case. The files alg.c, chan.c, complex.c, cpuprof.c, export_test.c, hashmap.c, iface.c, lfstack.c, slice.c, and symtab.c all contain runtime functions called from Go and need to be converted to .goc files instead. For the most part this is a matter of “hg mv”, adding a package statement, and then translating the function prototypes from C to Go. The function bodies need not change, so the diffs are not enormous.

Package syscall

NaCl requires the usual per-operating system assembly such as the Syscall function and the set of common functions expected by packages like net and os. However, NaCl is a very minimal operating system. There are many common things that it does not provide, such as file system access (except in a debugging mode), pipes, and networking. It would be possible to use build tags and NaCl-specific source files to limit higher-level packages correspondingly, but that would require significant implementation work in those higher-level packages. Instead, we choose to implement simulations of the missing functionality in package syscall itself, so that to higher-level packages NaCl looks like most of the other Unix systems that Go runs on. That is, there is more code in NaCl’s package syscall than in most systems, but in return there is almost no NaCl-specific code in the higher-level packages.

NaCl has its own concept of file descriptors: it uses them for I/O to standard input, standard output, and standard error, and it also assigns file descriptors to allocated mutexes, condition variables, and other objects. To provide a simulation of full file system access, package syscall must maintain its own file descriptor table that dispatches to multiple implementations, only one of which invokes NaCl calls. The fd table is about 300 lines of Go code.

NaCl provides access to the host file system if a debugging flag is given to the sel_ldr binary, but that mode is not considered secure for obvious reasons, and the file system API is substandard. Among other things, it does not have pread and pwrite system calls, so there can be no faithful implementation of Go’s ReadAt or WriteAt methods. Instead, package syscall implements its own in-memory file system with Unix-like semantics, used only during NaCl builds. The file system is initialized at first use from a zip file image linked into the binary. The zip image can be adjusted for different environments. For example, one zip file might provide the test data necessary for Go unit tests, while another might provide useful files for the Go Playground. The file system and the unzip implementation are each about 700 lines of Go code.

NaCl provides no access to traditional networking. In order to allow Go Playground examples with network listeners and dialing, package syscall implements the basic TCP and UDP system calls with an in-memory network implementation. The same implementation provides the basis for simulated Unix pipes. The network implementation is about 800 lines of Go code.

Simulating reads from /dev/random (needed for some crypto packages) requires invoking the NaCl “SecureRandom” function, which requires speaking NaCl SRPC using a set of custom system calls to send and receive messages, instead of being an ordinary function like every other NaCl entry point. The SRPC protocol itself is subject to backwards incompatible changes; worse, even though there is an explicit version exchange at the beginning of the protocol, the version is not updated when incompatible changes are made. I do not know how often such changes happen. I do know that the protocol version number has the same today (in 2013) that it did when I implemented SRPC for an earlier NaCl port in 2009, and yet the protocol wire format today is very different from what it was in 2009. The SRPC implementation today is about 800 lines of Go code. The official way to avoid implementing SRPC is to use the NaCl IRT, but that comes with a different set of problems (see the appendix below).

Other packages

Most packages with *_amd64.s source files need to be translated into equivalent *_amd64p32.s source files as well. Typically this is a matter of changing pointer manipulation to use MOVL and adjusting stack and frame pointer offsets. The use of a new GOARCH means that if the packages are written to fall back to Go implementations on architectures without assembly, they will work without changes (perhaps slowly) on amd64p32.

Appendix: Fake Time

The Go Playground provides a simulated clock starting at Unix time 1257894000. Time only steps forward when all goroutines are blocked or sleeping. This simulated time was done originally to avoid exposing the host clock, but the simulation also makes it possible to cache the execution of popular examples. The existing simulation runs in a single thread and avoids creating new threads, but we want the NaCl runtime to be allowed to create real threads. To simulate time steps, we will modify the timer goroutine to park itself rather than execute a real sleep, and then the runtime deadlock detector will advance the clock (if possible) to resolve a deadlock, instead of crashing.

Appendix: NaCl IRT

NaCl provides two different ways to invoke system functionality. There is a set of “system calls”, invoked by an ordinary CALL instruction to a set of fixed addresses (specifically, 0x10000 + 32 * system-call-number). This mechanism is deprecated and in theory the system calls numbers can change from release to release. In practice the system call numbers and interfaces have been stable. The Go support will invoke these system calls directly, possibly tying Go binaries to specific versions of the NaCl run time.

The replacement for these system calls is to load a second NaCl binary alongside the one containing user code. That second binary, called the IRT, invokes the user binary with a modified auxv vector with AT_SYSINFO set to a pointer to a function called nacl_irt_query:

        size_t nacl_irt_query(const char *name, void *table, size_t table_size);

The first argument to nacl_irt_query is an name like “nacl-irt-basic-v0.1”, and the effect of the call is to initialize the table with a list of function pointers corresponding to the implementation of that defined interface. The caller is expected to know the types of the functions being requested. For example, the definition of the table for “nacl-irt-basic-0.1” is

struct nacl_irt_basic {

  void (*exit)(int status);

  int (*gettod)(struct timeval *tv);

  int (*clock)(clock_t *ticks);

  int (*nanosleep)(const struct timespec *req, struct timespec *rem);

  int (*sched_yield)(void);

  int (*sysconf)(int name, int *value);

};

The user binary is expected to use nacl_irt_query to obtain such tables and then call the functions listed in the tables to invoke the corresponding functionality. The IRT itself provides implementations specific to the version of NaCl being used. It is essentially an abstraction layer hiding the system call details, but in code not trusted by NaCl. I tried to make this work and could not figure out how to invoke sel_ldr_x86_64 with an IRT binary. If Go’s support for NaCl ever targets Google Chrome, making IRT work will be a requirement. However, Chrome is not a target now.

There is code commented out in os_nacl.c and sys_nacl_amd64p32.s that attempts to load IRT functions. Making a call to one of them instead of an ordinary system call will require switching from a g stack to an g0 (m) stack, just like a cgo call.