Go “Return at End of Function” Requirements
Russ Cox
March 2013
Abstract
I propose that the existing Go compilers adopt a purely syntactic definition for whether a function is missing a return statement. Doing so will remove the need for about half the function-ending panic statements in existing code, or about three quarters of the function-ending panics with arguments containing the substrings reach, never, shouldn’t, should not, or impossible.
I further propose that the definition be added to the spec, which today is silent about when return statements are necessary. Doing so will ensure that current and future compilers agree on which programs are in error.
Fixes issue 65. Discussion on the mailing list.
Background
Issue 65 concerns the current “function ends without a return statement” compiler error. It is the oldest open issue, and it still comes up every few months. (I locked the issue so that new comments are not allowed, but see the mailing list or other discussion forums.)
Roughly speaking, the gc and gccgo compilers require that the last statement of a function be a return, goto, or panic; otherwise they emit a “function ends without a return statement” error. In September 2008, along with the initial gc implementation, Ken wrote this comment:
// can this code branch reach the end
// without an unconditional RETURN
// this is hard, so it is conservative
The most common problem encountered with this definition is that people write min like this:
func min(x, y int) int {
if x < y {
return x
} else {
return y
}
}
This implementation does not satisfy the definition, so the compiler rejects the function as ending without a return statement.
Issue 65 was filed objecting to this compile error on November 11, 2009, shortly after the initial open source release. People continue to rediscover and object to this behavior, enough that I made the issue read only. In 2010 I wrote on the issue, “This is a well-known issue we plan to address, but there are higher-priority issues right now. There's no need to try to persuade us that it's a problem.” This is still true, and there will probably always be higher priority issues,, but I think it has gone on long enough that we should try to do better for Go 1.1.
The case for min is fairly weak, since it can be written without the else, making the code both acceptable to the compiler and shorter. However, a stronger case can be made using functions with for, switch, or select statements, like:
func readNonEmpty(c chan []byte) []byte {
for {
if p := <-c; len(p) > 0 {
return p
}
}
}
func readTimeout(c chan []byte, d time.Duration) []byte {
select {
case <-time.After(d):
return nil
case p := <-c:
return p
}
}
Neither can be rewritten to satisfy the compiler while maintaining the same clarity: the solution today is to end both with panic(“unreachable”). At least when I do that, I always have the nagging concern that maybe I have made a mistake and the bottom of the function really is reachable. I would rather the compiler confirm my understanding by accepting the code without the panic.
The spec is silent on when a return statement is necessary. Gccgo originally had more sophisticated analysis than gc, but Ian dialed it back to the gc definition so that the two compilers would agree on whether a program was in error. Others are writing compilers now too, so even if this proposal is rejected, something should be written in the spec for Go 1.1.
Proposed Definition
The body of a function or method with return values must end in a terminating statement.
A terminating statement is one of the following:
(The wording of 8c assumes that a fallthrough in a switch’s last case is disallowed by the definition of a switch. Making that clearer in the current spec is issue 4923.)
Properties of the Proposed Definition
The current gc compiler behavior corresponds exactly to proposed rules 1 through 4. The proposed definition is therefore backwards compatible with existing compilers, allowing more programs to compile but not rejecting any existing ones.
The proposed definition is purely syntactic, using only information available from a Go parser such as the “go/parser” package. In particular, the definition does not require constant evaluation or any kind of transitive reachability analysis. The definition does require label resolution for break statements and identifier resolution for determining whether a call to “panic” refers to the built-in. Label resolution is trivial, because within a function all labels must be unique. Identifier resolution is less trivial, but the “panic” rule is a relied-upon property of the current implementations and certainly cannot be removed.
The proposed definition, while an expansion of the current rules, is still fairly simple: given a parsed syntax tree from the “go/ast” package, identifying whether a function ends in a return statement can be done in 180 lines of Go code: 110 for the basic definition and another 70 to match “break” statements to their surrounding statements.
To be clear, the proposed definition only considers whether a statement may end a function. It is not concerned with the more general problem of identifying unreachable code. Even if that were desirable, which is far from clear, it cannot be done until Go 2, because the current implementations accept functions containing unreachable code.
Existing Usage
To gather data about the effect of the proposed definition on real programs, I ran an experiment in late February. The godoc.org web site listed 6,021 known open source Go packages, and I downloaded all the ones I could using “go get.” In total, I downloaded 38,734 Go source files containing 8,330,489 lines.
I modified “go vet” to identify whether a panic found at the end of a function with a return value was required by the proposed definition, along with other interesting facts, and then I ran “go vet” on all the files.
In total, there are 2,112 function-ending panics in the source files. The proposed rules eliminate the need for almost half of these panics, making 1,007 redundant and leaving 1,105 required.
Of the 1,007 redundant panics:
This confirms my claim above that for, switch, and select statements—not if statements—make the most compelling case for the broader definition.
Again considering the 1,007 redundant panics, 919 use argument strings containing reach, never, shouldn’t, should not, or impossible. Another 49 use panic(nil). Even including arguments that appear just once, only 75 distinct arguments are passed in these 1,007 panic calls. Extracting words in the arguments and counting, the top words are:
589 unreachable
192 reached
186 not
71 reach
60 cannot
49 nil
38 code
31 never
24 unreached
19 here
15 should
12 impossible
11 shouldn't
From the fairly uniform characteristics, I conclude that the panics no longer required by the proposed definition are nearly all formulaic panics that would not have been written had the compiler not insisted.
Of the 1,105 panics still required even under the broader rules:
Again considering the 1,105 required panics, 299 use argument strings containing reach, never, shouldn’t, should not, or impossible. Most of these follow switch statements that only handle “expected” cases; given bad data, the panics are neither unreachable nor impossible. The 299 mentioning these terms make up only 27% of the required panics, compared to 91% in the redundant panics.
The required panics are also far more varied in their arguments, with 468 distinct arguments. The most common words are:
236 not
202 unreachable
150 type
108 implemented
82 unknown
68 unexpected
66 reached
53 invalid
47 value
47 string
47 for
45 no
38 kind
The set of panics made redundant by the proposed definition and the set of panics still required have quite different profiles and characteristics. From this, I conclude that the proposed definition does align with most programmers’ expectations of what the compiler might reasonably require.
Alternatives
I think we have to document some definition. I think it makes sense to use this one, but there are alternatives.
One alternative is to use only rules 1, 2, 3, 4; this documents the current behavior.
A second alternative is to use only rules 1, 2, 3, 4, 6, 7, dropping 5 (if-else) and 8 (switch). Both if-else and switch can be handled cleanly already, by moving the else body or default block into its own statement, which becomes the terminating final statement. This appeals to those who do not want to see two ways to write the min function, but it is probably too irregular.
A third alternative is to expand the rules to require traversing entire statement blocks, not just the final statement in the function or block. All the instances of “ends in a terminating statement” would change to “contains a terminating statement without a subsequent labeled statement”. That would allow a function body like:
func foo() int {
panic(1)
f()
g()
h()
}
because the code after the panic is dead. This expansion is equivalent to the “falls through” proposal that Ian made in a mailing list discussion on this topic in 2011. In the terms of that discussion, this doc’s (unexpanded) proposal amounts to “the final statement in the function must not fall through.”
A fourth alternative is to stop requiring return statements in functions, defining a behavior for functions that fall off the end. For example, the function might automatically panic, return zero values, or return the current values of the named return parameters on reaching the end. I think this would be a mistake: Go has types for a reason, and it should make sure that if you said the function returns something, it actually does. The original implementation did not require a final return. After Ken added the requirement to the compiler, Rob fixed the tree in two CLs. By my count, there were 11 edits: 6 redundant panics or code restructurings that are no longer necessary using this doc’s proposal, and 5 real bug fixes, both missing return statements and incorrect function signatures.
Other Languages
Untyped languages like JavaScript or Python typically admit functions without return statements, since, in the absence of types, the compiler doesn’t know whether the function intends to return a value. Such functions return a special value like “undefined” or “None.” Typed languages make a more interesting comparison.
The C99 draft specification allows missing return statements. It says only, “If the } that terminates a function is reached, and the value of the function call is used by the caller, the behavior is undefined.”
The Java Language Specification’s section 8.4.7 defines that “If a method is declared to have a return type, then a compile-time error occurs if the body of the method can complete normally.” and refers readers to section 14.1 for a definition of normally, which in turn depends on section 14.21’s the four-page definition of “unreachable.”
The goal of this doc’s proposal is to strike a reasonable balance between the incompleteness of C’s definition and the weight of Java’s.