Enhanced null handling - Null-safe types

DRAFT - 2007-01-11

Stephen Colebourne

I. Problem

The Java language allows any variable, except primitives, to be null. This leads inevitably to NullPointerExceptions when a developer does not correctly check for null.

One popular solution is to define in documentation which variables, parameters and return types can be null, and which cannot be null. This is fragile, and error prone. For example, compare a HashMap to a ConcurrentHashMap:

  public class HashMap<K, V> implements Map<K, V> {

    /**

     * @param key  the key to retrieve, may be null

     * @return value  the value mapped to the key, null if not found

     */

    public V get(K key) { ... }

  }

  public class ConcurrentHashMap<K, V> implements ConcurrentMap<K, V> {

    /**

     * @param key  the key to retrieve, not null

     * @return value  the value mapped to the key, null if not found

     */

    public V get(K key) { ... }

  }

It is very easy to miss this documentation and pass a null key into the ConcurrentHashMap.

This proposal moves the null-safety from documentation to code, allowing the compiler to check the system for null-safety at compile-time rather than runtime. In our example here, the result would be:

  public class ConcurrentHashMap<K, V> implements ConcurrentMap<K, V> {

    /**

     * @param key  the key to retrieve

     * @return value  the value mapped to the key, null if not found

     */

    public V get(#K key) { ... }

  }

The hash # symbol indicates that the parameter cannot be null, and this can be checked at compile-time.

II. Null-safe types

The proposal introduces a new modifier - hash # - to types when they are declared, in variable declaration, field declaration, method parameter or method return type:

    Address address = null;                       // address variable can be set to null

    #Person person = new Person();                // person variable cannot be set to null

    public #Person lookup(#String personId) {...} // personId and return-type are never null

Rationale: Other languages add a modifier to the nullable type, making that the special case. This seems inappropriate for Java where existing developers, and all existing code, expect to be able to assign null to a type with no special modifier. The # modifier is intended to be read as 'non-null', thus in the example you would describe the method as 'get a non-null person taking a non-null person identifier'.

The type system is altered as follows:

    Person p1 = null;                 // allowed, no change from current rules

    Person p1 = new Person();     // allowed, no change from current rules

    #Person p2 = null;            // compile error, as p2 cannot be null

    #Person p2 = new Person();    // allowed, as new Person() is definitely non-null

    Person p1 = null;

    #Person person = p1;              // compile error, can't assign null to p2

These rules are altered based on whether the compiler can prove that the variable value is null or non-null. At a minimum the following should be supported - if statement comparison to null, object creation, literal assignment (currently only String and auto-boxed primitives), null assignment and within instanceof.

    Person p1 = new Person();

    #Person p2 = p1;                  // allowed, because compiler can prove p1 is non-null

    Person p1 = old.getPerson();  // old.getPerson() has no null-status information in the return type

    if (p1 != null) {

      #Person p2 = p1;                // allowed, because compiler can prove p1 is non-null

    }

    Object obj = ...

    if (obj instanceof Person) {

      #Person p2 = (Person) obj;  // allowed, because compiler can prove obj is non-null

    }

A new cast operator - (#) - is added to allow a variable that cannot be proved to be null or non-null to be cast to a non-null type. This is useful for integration with legacy code that does not define the null status of its method return types and fields.

    Person p1 = old.getPerson();  // old.getPerson() return-type is nullable

    #Person p2 = (#) p1;              // allowed, because compiler does not know status of p1

    Person p1 = null;

    #Person p2 = (#) p1;              // compile error, because compiler can prove p1 is null

    Person p1 = new Person();

    #Person p2 = (#) p1;              // allowed but unnecessary, because compiler can prove p1 is non-null

The compiled class file would contain an attribute to store the modifier, whether true or false. This is needed for each field, parameter and method return type. This would exist solely for the use of the compiler and would not be verified at runtime. Classes without the attribute will be assumed to be legacy code during compilation, meaning that the compiler cannot prove that the value is null.

Having made these changes, it is now possible for the compiler to be changed to prevent compilation of code that may throw a NullPointerException:

    Person p1 = old.getPerson();  // old.getPerson() return-type is nullable

    p1.getAddress();                  // compile error?, as compiler doesn't know it will succeed

There is no difficulty in making the compiler do this, however, it probably isn't reasonable to do this at this point in the life of Java. Doing so would create a barrier between new code and old code that may be too large to easily cross. Thus, this would only be a compile error if specifically switched on.

The Java libraries will be retrofitted to define the null status of each method and field. Since null-checking will still occur as normal in the JVM, code compiled under a prior JVM will run under the new JVM without any problem. It will however, be impossible to compile existing code that doesn't handle null correctly using the new javac.

To alleviate the javac issue, it is proposed that null checking during compilation can be turned off in various ways. It can be turned off globally or per class/method using an annotation as with generics.

III. Further issues

1) Initial assignment. The following (poor) code won't compile, but is a pattern that gets used. How should it be handled?

    #Person p = null;              // compile error

    if (useFile) {

      File file = getPersonFile();

      p = createPersonUsingFile(file);

    } else {

      p = createPersonUsingDatabase();

    }

2) Generics. This interaction between this proposal and generics has not been considered. The Nice language uses an additional operator for interaction with generics.

3) Fields on objects. The Nice language doesn't allow null-inference on fields as the value may be changed by another thread. Since this proposal is not trying to emulate that strength of null-safety, it seems reasonable to allow null-inference on fields.

IV. Alternatives

Alternate proposals suggest adding an @NotNull attribute to fields and parameters. Whilst this tackles the same problem area, it suffers from being verbose and intrusive. It is also the wrong device (an annotation) for fixing what is a language level missing piece (full handling and control of nulls). Finally, the annotation does not tackle the code clarity issues addressed in section II of this proposal.

Proponents of an annotation approach also extend the annotations to cover non-empty lists, min-max lengths for string, and so forth. These need to be clearly separated as being application level validation, not language level syntax. The null keyword is language level, and so it follows that proper control and management of nulls is also a language level item.

V. Summary

This proposal moves checking of nulls from runtime to compile-time. This will avoid many NullPointerExceptions and increase the robustness of Java programs.

VI. References

Nice option types - http://nice.sourceforge.net/manual.html#optionTypes

Bug database - http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5030232