Concise Instance Creation Expressions: Closures without Complexity

Bob Lee, Doug Lea, and Josh Bloch

I. Introduction

Since release 1.1, Java has had closures in the form of anonymous class instance creation expressions (JLS 15.9.5) that extend a class or interface with a single abstract method. We call such classes and interfaces single-abstract-method types, or SAM types for short. SAM types include Runnable, Callable, Comparator, and TimerTask. The resulting closures are often used to parameterize the behavior of collections (e.g., a Comparator specifies the order of a SortedSet), and to submit code for concurrent execution  (e.g., a Runnable may be executed by a Thread or an Executor). There are many other uses, including callbacks, factories, predicates, and strategies.

Unfortunately, the verbose and ungainly syntax of anonymous class instance creation expressions frustrates programmers. The problem has become more acute due to the ongoing concurrency revolution. We therefore propose a more concise syntax to instantiate anonymous classes. The basic idea is to omit the keyword new, the class declaration, and the method name from class instance creation expressions. This may not sound like much, but it results in significantly less boilerplate and enhanced readability.

For example, here is the Java 5 code to start a thread whose run method invokes a method named foo:

    new Thread(new Runnable() {

        public void run() {

            foo();

        }

    }).start();

If we adopt this proposal, the following code would be equivalent:

    new Thread(Runnable(){ foo(); }).start();

Here is the Java 5 code to sort a list of strings by length (from shortest to longest):

    List<String> ls = ... ;

    Collections.sort(ls, new Comparator<String>() {

        public int compare(String s1, String s2) {

            return s1.length() - s2.length();

        }

    });

Here is the same code rewritten using the proposed syntax:

    List<String> ls = ... ;

    Collections.sort(ls,

        Comparator<String>(String s1, String s2){ return s1.length() - s2.length(); });

From the programmer's perspective, that's pretty much all there is to it: no new concepts to learn, just a more concise syntax for something they already do.

II. Syntax and Semantics

We introduce a new kind of expression, called a concise instance creation expression (CICE):

    ConciseInstanceCreationExpression:

        ClassOrInterfaceType  ( FormalParameterListopt ) MethodBody

You can use a concise instance creation expression everywhere that an instance creation expression is currently legal. The construct is legal only if ClassOrInterfaceType represents a class or interface type with a single abstract method (a SAM type). The construct behaves as if replaced by this Java 5 code:

    new ClassOrInterfaceType () {

        AccessModifier ResultType MethodName ( FormalParameterListopt ) Throwsopt
           MethodBody

   }

The compiler copies the AccessModifier, ReturnType, and Throwsopt clause (if it exists) from the declaration of the sole abstract method in the class or interface type. The expression generates a compile-time error if the formal parameter list or the method body is inconsistent with the sole abstract method in the type, or if the type is a class type without an accessible parameterless constructor.

III. Local Variables From the Enclosing Scope

Programmers often complain about having to declare local variables final in order that an instance creation expression's method body is permitted to access them. We propose making such variables final by default. Programmers also complain that instance creation expression method bodies are not permitted to share mutable local variables with the enclosing scope. Therefore, we further propose allowing such access if the local variables in question are explicitly declared public.  (If the instance executes in a different thread, it's up to the programmer to manage concurrent access.)

More specifically:

Here's an example of the "annoying final" in Java 5.  This method takes a comparator and returns a comparator that induces the reverse ordering:

    static <T> Comparator<T> reverseOrder(final Comparator<T> cmp) {

        return new Comparator<T>() {

             public int compare(T t1, T t2) {

                return cmp.compare(t2, t1);

             }

        };

    }

Here's how the method would look if this proposal were adopted:

    static <T> Comparator<T> reverseOrder(Comparator<T> cmp) {

        return Comparator<T>(T t1, T t2){ return cmp.compare(t2, t1); };

    }

Here's an example of the contortions required to get around the Java 5 requirement that local variables accessed by inner classes must be final.  This snippet sorts an array and prints out how many element comparisons were performed in the process:

    final int[] numCompares = new int[1];

    Arrays.sort(a, new Comparator<Integer>() {

        public int compare(Integer i1, Integer i2) {

            numCompares[0]++;

            return i1.compareTo(i2);

        }

    });

    System.out.println(numCompares[0]);

Note the use of the single element array (numCompares) to pass a value back from the closure to to the surrounding scope.  This is necessary because all local variables must be final.  If this proposal were adopted, the above example could be replaced by this:

    public int numCompares = 0;

    Arrays.sort(a, Comparator<Integer>(Integer i1, Integer i2) {

        numCompares++;

        return i1.compareTo(i2);

    });

    System.out.println(numCompares);

Why the restriction that for-loop indices may not be labeled public?  Because any code that does so is almost certainly broken.  For example, consider this loop:

    for (public int taskId = 0; taskId < NUM_TASKS; taskId++) {

        executor.execute(Runnable(){ newTask(taskId); });

    }

It is almost certainly the author's intent that each runnable get its own taskId from 0 to NUM_TASKS - 1. If the above code were legal, all of the tasks would share a single taskId, which would be NUM_TASKS.  Here is a fixed version of the loop:

    for (int i = 0; i < NUM_TASKS; i++) {

        int taskId = i;

        executor.execute(Runnable(){ newTask(taskId); });

    }

Note that the loop index (i) need not be made public.  Each Runnable gets its own taskId, which is implicitly final.

IV. Library Support

At the same time as this facility is added to the language, it would make sense to add a few more single method interface types, such as Predicate, Function, and Builder. Perhaps a few utility methods in java.util.Collections would not be amiss.

We can also introduce alternatives to existing concrete classes with methods that are meant to be overridden. For example, consider the following typical Java 5 use of ThreadLocal:

    private static final AtomicInteger nextId = new AtomicInteger(0);

    private static final ThreadLocal<Integer> threadId =

        new ThreadLocal<Integer>() {

            @Override protected Integer initialValue() {

                return nextId.getAndIncrement();

            }

        }

    };

Because ThreadLocal is a concrete class rather than a SAM type, it is not amenable to the CICE. If we introduce AbstractThreadLocal, a subclass of ThreadLocal with an abstract initialValue method, the following code would be equivalent:

    private static final AtomicInteger nextId = new AtomicInteger(0);

    private static final ThreadLocal<Integer> threadId =

        AbstractThreadLocal<Integer>(){ return nextId.getAndIncrement(); };

A similar treatment might be desirable for LinkedHashMap and its removeEldestEntry method.

V. Further Ideas

It is, in many cases, technically feasible to infer types from the formal parameter list and method body of a CICE.  It is worth exploring the pros and cons of doing so.

If we do end up allowing public local variables, we should consider allowing them to be made volatile.  Consistency dictates that this should be legal, as all other variables that are accessible by multiple threads can be made volatile.  On the other hand, we don't necessarily want to encourage multiple threads to access local variables without proper synchronization.