Go 1.4: unsafe.Pointer arithmetic
Matthew Dempsky <mdempsky@google.com>
Aug 2014
Go issue tracker: https://golang.org/issue/40481
golang-dev discussion thread: https://groups.google.com/d/msg/golang-dev/bfMdPAQigfM/U3WUy_yU-HMJ
1. Add a new built-in function to package unsafe:
// Add returns the pointer p+v.
func Add(p Pointer, v uintptr) Pointer
(Like other built-in functions, unsafe.Add may only be used in a call expression, not as a function value.)
2. Add a rule in the Go spec that unsafe.Pointers into a [Go] variable can only be derived via a conversion from (normal) pointers into that same variable or a call to unsafe.Add using another unsafe.Pointer into that same variable. (Note: definition of “into” does not include C-style “just-past-the-end” pointers.) Additionally, converting pointers to size-zero type to unsafe.Pointer is not safe.
3. Extend “go fix” to rewrite expressions like “unsafe.Pointer(uintptr(p) + v)” to “unsafe.Add(p, v)”.
4. Extend “go vet” to warn against converting “uintptr” or a pointer to size-zero type to unsafe.Pointer.
5. (Optional) Extend compiler/runtime to provide runtime instrumentation to check that unsafe.Pointers are being used “safely”.
Go would like to support moving GCs. This requires the runtime transparently rewrite pointer values, but the ability to convert unsafe.Pointer to uintptr and back makes this tricky. In particular, the runtime may not be able to determine whether a uintptr value is actually a pointer or not, so it can’t safely rewrite them. Additionally, multiple evaluations of uintptr(p) for specific (or related) unsafe.Pointer values might yield inconsistent results if in between the GC runs and moves the pointed-to Go variable.
To address this problem, we propose adding unsafe.Pointer arithmetic primitives to avoid most needs to convert uintptr values to unsafe.Pointer, and rules guiding their safe use. Rationale on the specific proposal points to follow.
The Go compiler already needs to support limited pointer arithmetic for expressions like “&p.f” (where p is a pointer to a struct with field f). Exposing this primitive in the form of a builtin like unsafe.Add makes sense to benefit from whatever changes need to be made to the compiler in the future to support moving GCs.
The “Add” abstraction is already used in package runtime as
// Should be a built-in for unsafe.Pointer?
func add(p unsafe.Pointer, x uintptr) unsafe.Pointer {
return unsafe.Pointer(uintptr(p) + x)
}
so this proposal is merely to provide the same API via package unsafe.
It might be attractive to instead provide arithmetic methods like those on time.Time:
func (p Pointer) Add(v uintptr) Pointer
but formally unsafe.Pointer is a pointer type, and methods may not be bound to pointer types. It’s possible to special case unsafe.Pointer, but then it might then be desirable to have unsafe.Pointer implement “interface { Add(uintptr) unsafe.Pointer }”, which introduces further complexity.
Defining unsafe.Add as a standalone built-in function sidesteps both of these issues.
The runtime should have leeway to move Go variables around however it wants. In particular, it should not be constrained to keep a fixed distance between arbitrary variables. This makes it unsafe to rely on offsets between objects like:
var x, y, z int
offs := []uintptr{
uintptr(unsafe.Pointer(&y)) - uintptr(unsafe.Pointer(&x)),
uintptr(unsafe.Pointer(&z)) - uintptr(unsafe.Pointer(&x)),
}
…
return *(*int)(unsafe.Add(unsafe.Pointer(&x), offs[i]))
as x, y, and z might be moved around, and the offsets within offs could become invalid.
However, it’s reasonable to keep code like this working:
var x struct { y, z int }
offs := []uintptr{
unsafe.Offsetof(x.y),
unsafe.Offsetof(x.z),
}
…
return *(*int)(unsafe.Add(unsafe.Pointer(&x), offs[i]))
as the runtime will need to keep the relative offsets between &x, &x.y, and &x.z anyway.
Note that this is very similar to the rules for safe char pointer arithmetic in C. However, one notable difference is C allows pointer values “just past the end” of an object (though such pointers are not allowed to be dereferenced). But consider the following example:
var x, y int
p := unsafe.Add(unsafe.Pointer(&x), unsafe.Sizeof(x))
If x and y are initially allocated consecutively in memory, then p will have the same value as unsafe.Pointer(&y). The GC would then not know whether this is logically a pointer just past the end of x or a pointer to the start of y, which would prove problematic if it wants to move x and y to new non-consecutive addresses.
This could be addressed by either requiring the GC to keep x and y consecutive or by including padding bytes between all objects that might escape to an unsafe.Pointer, but it’s unclear that Go code benefits from this C idiom. Instead, it’s simpler to forbid this convention for now.
One catch though is zero size types. Any pointer to a zero-size type is inherently just past the end of the object. It’s also desirable for “unsafe.Add(unsafe.Pointer(&s.f), -unsafe.Offsetof(s.f))” to evaluate to the same as “unsafe.Pointer(&s)”, but that can’t be guaranteed if s has type “struct { x int; f [0]byte }” as &s.f is then just past the end of s.
Fortunately there’s not much use for zero-size objects in Go code, so again the simplest solution is to outright ban converting pointers to them to unsafe.Pointer. It would be possible to relax the rule to allow pointers to zero-size objects *within* a type (e.g., unsafe.Pointer(&s.f) is safe if s instead has type “struct { f [0]byte; x int }”), but describing that seems more complex.
Once we have an unsafe.Add function, it’s easy to automatically rewrite existing code that could be updated to make use of it.
Once we’ve established it’s unsafe in general to convert uintptr or pointers to zero-size types to unsafe.Pointer, it makes sense for “go vet” to warn against these practices.
It should be easy for the Go compiler to insert optional runtime instrumentation to verify that code follows the pointer arithmetic rule. E.g., given a call to unsafe.Add(p, v), the compiler could insert a call to runtime.checkptradd(p, v) that looks up whether p+v points to a GC’d object; and if so, makes sure that p points to the same GC object. If not, it panics or prints a warning.
Beyond unsafe.Add a few other possible operations stand out as possibly useful:
// Sub returns the offset p-q.
func Sub(p, q Pointer) uintptr
// Less returns p<q.
func Less(p, q Pointer) uintptr
// IsSameVariable returns whether p and q point into the same variable.
func IsSameVariable(p, q Pointer) bool
// Slice returns a []byte for the n bytes of memory starting at p.
func Slice(p Pointer, n uintptr) []byte
Sub could be useful because uintptr(p) - uintptr(q) might give a bogus value if a GC happens in between the evaluations of uintptr(p) and uintptr(q). Less and IsSameVariable could be similarly useful in some situations.
Slice might be useful for cases like package reflect’s memmove function, which could then be implemented simply as:
func memmove(dst, src unsafe.Pointer, n uintptr) {
copy(unsafe.Slice(dst, n), unsafe.Slice(src, n))
}
Moreover, an ability to create a slice for an arbitrary type (i.e., not just byte) could be useful.
However, none of these functions seem as immediately needed as unsafe.Add.