Proposal for a Go Interpreter

The Go ecosystem contains a number of interpreters, for example for Lua and JavaScript, so why not for Go itself?

This document proposes a tentative Objective, Design and Plan for a Go interpreter. Please give feedback in golang-nuts.

Why do it at all? The two collaborators on this document have expressed two different main hopes:

  • Sebastien in Reddit: “I come from the scientific crowd. my main motivation is to allow exploratory work directly in Go, from a Go interpreter. doing exploratory work from a REPL (ie: iterating on a state you modify with new functions you write or new data you load and then plot) is what makes doing science fun. Bonus points would be given to the implementor of the new go interpreter if one could seamlessly load and use symbols from a C library, so you could use all the battletested scientific libraries that are out there. (I didn't manage to do that for https://github.com/sbinet/igo </shamelessplug> )”
  • Elliott in golang-nuts: “Here is the Go as extension-code use case: Please imagine that that you have written a system where you want the users of that system to be able to extend that system's functionality by writing additional Go code of their own. Of course, you can't trust them to write code that is not malicious. So if you are to use a compiled solution, you will need to run their code in some sort of sandbox facility, for example in NaCl or under some operating system sandboxing. But actually, to do anything useful, this extension Go code may need controlled access to some of the state of the running system, and the running system may also want to call functions in the extension Go code. This "trap-door" cross-access between the main system and its extension code needs to be done under strict control. An interpreter provides a way to do this. It would be wonderful if better minds than mine could conceive of a fully-compiled approach that will solve this problem uniformly for every Go target-environment, including Google AppEngine.”

Objective

To provide an interpreter that can embed inside a Go application, in order to provide scriptability to the Go ecosystem at large.

Sub-objectives:

  1. Simplicity – the interpreter should be executable in any Go environment.
  2. Security – the default configuration should provide no opportunities for rogue interpreter code to crash the host program, and minimal opportunities for denial of service attacks.
  3. Separation - the execution environment could be different from that of compilation.
  4. Speed – the Go community will expect nothing less ... but at least initially, the interpreter will require more memory and be slower than compiled code.
  5. Standard - the project will be a success if it becomes normal to use Go (rather than say Lua or JavaScript) where scripting for end-users is required.

Simplicity

Executing in any Go environment, including Google App Engine, requires a pure Go implementation. Meaning that unsafe Pointer and assembler can only be used in optional target-environment-dependent packages.

It must also be simple for the interpreter builder to create “trap-doors” into their compiled Go code, allowing interpreted Go to make calls to externally defined compiled Go functions. Together with the pre-compilation of libraries to be used during interpretation (see below), this provides users a “build your own VM” opportunity.

Also, of course, compiled Go code can run functions in the interpreted Go code.

But  “Here be Dragons” - calling compiled Go code from interpreted code is potentially very dangerous, even calling interpreted code from compiled code should be done with care.

Security

By default the interpreter should be secure, with optional flags to make it insecure.

The default (fake) interpreter OS should be NaCl, an implementation (used by both the Go Playground sandbox and TARDIS go) that provides an in-memory file system and no access to external networking.

To minimize the risk of denial of service attacks and race conditions, the interpreter should not spawn goroutines by default (but rather use a coroutine implementation, as TARDIS go does); should place a programmable limit on the amount of memory and CPU time that interpreted code can consume; and should provide an easy way to allow controlled read-only access to selected state in the compiled host program (reducing the need for “trap-door” functions).

Separation

It would be beneficial if the compilation and execution phases could be run on separate computers, allowing, for example, interpretation of code in environments which did not contain a full set of library source-code, or where it was not desirable for the source code to be seen. This will require the intermediate format between compilation and execution phases to be serialized.

It would also be desirable to serialize and deserialize the current execution state of the interpreter. This would allow hand-off of a running interpreter from one execution machine to another – for example from Android to OSX. This requirement means that the data layouts of all interpreters must be the same (meaning a 64-bit data layout on 32-bit hosts).

If links to packages using cgo were included in the interpreter runtime, say through providing a “trap-door” compiled function, the serialization/deserialization of interpreter state could only be achieved by writing custom code. Also execution of that state could only continue in an interpreter that had the same cgo-linked functionality baked-in.

Speed

For interpreters, the instruction-decoding loop is the pinch point.

Each instruction should be decoded only once and as a first speed-up measure, a closure created to implement it. Then each block of code becomes a list of closures, one per instruction, to be executed in sequence (this has already been tested).  

Potentially a very large number of prototype closure functions will be required, which will necessitate their auto-generation.  

Of course some future version of the interpreter could improve this by implementing per-architecture JIT. This is certainly the approach the core Go team would take if they implemented an interpreter, but it is infeasible without being able to write-to and then call as a function some piece of memory, which go1.5 does not (and should not) allow.

The need for speed will probably necessitate “interpreter pre-compilation” of packages that are to be available from the interpreted code, to enable them to be called efficiently by the interpreter. It is not just that the data formats and calling conventions of these packages must be the same as those of the interpreter, or that their symbols must be known to the compilation phase of the interpreter, they must also have separate global values for each interpreter instance. A simple example of the need for this would be math/rand “that produces a deterministic sequence of values each time a program is run”, which would not do so if the global values were shared. Instead each instance of the interpreter must have it’s own set of global values.

Standard

As Go gains in popularity, so it will not seem strange to use it as a scripting language.  To improve adoption, the interpreter must be very easy to use and as complete as possible.

 

Design

Below are some outline design ideas relating to:

  • SSA as the starting point.
  • Modeling unsafe pointers and memory objects.
  • Pre-compiling packages into the interpreter.
  • Use LLVM?

SSA as the starting point

The Static Single Assignment form of a Go program is currently available using golang.org/x/tools/go/ssa. Once it is stable, the main-compiler SSA form currently under development would be a better-optimized starting point, if it were made available for external use.

Generated SSA can be inefficient, and can benefit greatly from optimization.  TARDIS go does a little peephole optimization; the new SSA branch of the main Go compiler does significantly more; while the LLVM SSA optimizations are the basis of that tool-chain’s success.

SSA form, as its name suggests, requires a large number of temporary variables. In the “is not, and will never be, a production-quality Go interpreter” implementation at golang.org/x/tools/go/ssa/interp, all types are handled as interface{} values, generating significant execution and GC overhead. However there are only around 20 basic types, each of which will need to be explicitly and directly handled by the interpreter (where possible), in order to speed up execution.

These basic types are likely to drive the whole design, for example each interpreted goroutine might have a separate stack for each basic type.

The intermediate serialized SSA form used between compilation and interpretation phases is likely to be binary encoded.

Modeling unsafe Pointers and memory objects

A key implementation issue is how pointers, especially unsafe pointers, are handled. While it is desirable to allow unsafe pointers in the interpreted code, on some platforms, like the Google App Engine, unsafe pointers are not even permitted.

For accurate GC, Go needs to know where the pointers are. A pointer can be modeled as a pointer to a memory object plus an offset into that object (TARDIS go does this). If the objects themselves are then modeled as data and pointer components, no Go unsafe pointers are required, although the memory requirements of the system are increased. For example, a very simplistic memory object in the interpreter might comprise two parallel slices:

         type memoryObject struct {

                []pointers *memoryObject

[]data uint64

        }

All types of data would be packed into the data slice. Pointers would use both slices, with the object pointer in []pointers, and the offset into that object held in the []data slice.

Many variations are possible to this basic plan. For example, to optimize for speed, there can be one []data item per byte offset (rather than per 8 bytes) - unfortunately this speed boost comes at the price of not every possible unsafe.Pointer operation working correctly (although experience with TARDIS go suggests this is not such a problem in practice). Or to optimize for memory size, unused slices can be nil, while used slices can be replaced by a map if that would be more memory efficient (because non-zero data is sparse).

Pre-compiling packages into the interpreter

A pre-built interpreter, with tight security and interpreter-pre-compiled standard packages will cater for most needs.

For those developers with more complex requirements, a process will exist to build the “VM”, by pre-processing selected packages to be available to run as part of the interpreter in compiled rather than interpreted mode, then create a composite package including that generated code to act as the execution engine.

Use LLVM?

As Sebastien has said on Hacker News: “we are still pondering on whether we should target LLVM bitcode so we could easily build upon LLVM toolchain (and cross-pollinate with other LLVM-based interpreters. e.g. seamlessly interoperate with IJulia and CLing interpreters, sharing code and values at runtime thru LLVM.) to me, the main issue with LLVM is that it's a real pain to develop against, I know this for a fact having worked a little bit with CLing, the CLang-based C++ interpreter (the edit-compile-test cycle is that of a set of large C++ libraries, gophers tend to be spoiled in that department).”

Plan

Gather community feedback, to shape the best way to proceed.

The current plan is to evolve from golang.org/x/tools/go/ssa/interp through a number of working versions, benchmarking to improve execution speed at each stage:

  1. Demonstrate that optimizing one function at a time to use closures makes the decode loop faster (done).
  2. Serialize and de-serialize the SSA form, running the interpreter on the de-serialized form (this done early to avoid large code re-writing later).
  3. Implement memoryObjects and pointers, rather than nested ivalue types and *ivalue.
  4. Hold local stack variables in their basic type (e.g. float32), rather than as interface{}.
  5. Write interpreter state serialization and deserialization code.
  6. Implement co-routines and other security features (these facilities may be difficult to retrofit later).
  7. Write a transpiler to generate go code for packages to be called by interpreted code and act as if they are part of the interpreter “VM”.
  8. Test and fix issues to make as many standard packages work as possible, writing runtime functions to replace those in C/ASM as required.
  9. Test for all target compilation and execution environments.
  10. Test the hand-off of interpreter execution between different Go environments.

Clearly these are broad phases, with areas like the mock reflect package requiring constant updating as work progresses.

Related Projects

Sebastien has a curated list of projects to watch or to study: https://github.com/sbinet/notes-and-todos/blob/master/README.md

Elliott Stoneham

(Incorporating ideas from Sebastien Binet)

10 September 2015