GWT Benchmarking Design

Overview

GWT touts its capability to produce extremely optimized JavaScript applications as one of its main strengths. To keep ourselves true to that goal, we need a suite of benchmarks which can help us continuously drive GWT optimizations and spot performance regressions that otherwise might leak through.


Goals

  1. Support micro-benchmarks (for example, similar to speed-tests).
  2. Support benchmarking both Java and JSNI.
  3. Automatically avoid slow script warnings in benchmarks.
  4. Be as accurate as reasonably possible. Don't include overhead from JUnit infrastructure or setup and teardown functionality.
  5. Make simple benchmarks simple to write and complex benchmarks as simple as possible to write.
  6. Provide complete report data files. The reports should contain configuration information (host machine, version of browser, version of GWT), full benchmark results, and source code. These can be saved and analyzed offline to compare results over time (e.g. regressions), shared with others for comparison, etc...
  7. Provide a reporting application. The application should provide useful visualizations of the data in the reports.
  8. Support benchmarking on any browser and platform that GWT apps can be deployed to. Specifically, don't limit support to just the hosted browsers.
  9. Support distributed benchmarking. We should be able to run benchmarks across several different browsers in parallel, automatically.

Non-Goals

  1. Compilation-speed benchmarks (i.e. how long it takes to compile an application).
  2. Load testing. This can be performed with existing web-application load testing tools.
  3. Hosted mode benchmarking. Benchmarking should operate in hosted mode, but only so that benchmarks can be debugged. Hosted mode benchmarking results will not be useful, because they include the wrong information: execution of Java rather than JavaScript, JSNI boundary crossing, etc...

For future consideration

  1. Memory usage - There isn't a standard method for retrieving memory usage from JavaScript. After some investigation, it appears that we'd only able to measure memory coarsely for the entire browser process without resorting to intrusive techniques (e.g. hosted browser, special browser plugins, etc...).

  2. Code size (e.g. how large is the generated JavaScript for a function)


Features

    We add a new Benchmark class as a subclass of GWTTestCase. Users extend this class when creating their own Benchmark. Benchmarks differ from regular test cases in the following ways:





Design

Extending GWT's JUnit support


GWT has existing support for executing JUnit tests within a browser in a reasonably transparent fashion. In general, the requirements around executing performance tests are very similar to the requirements for a general testing mechanism: We create suites of tests which we can execute both locally and remotely in different environments in an automated fashion. Projects such as JUnitPerf demonstrate the common desire to extend testing frameworks to support performance testing. Based on the similarity between functional unit testing and performance testing, it makes the most sense to modify GWT's existing JUnit infrastructure to support performance testing's special needs (rather than implementing entirely separate support from scratch).

GWT's existing support for JUnit involves several different working pieces:

  1. Translatable JUnit classes (found in com.google.gwt.junit.translatable.junit.framework)
  2. GWTTestCase - a special base class for test cases which also has a special "translatable" version for the browser.
  3. JUnitShell, which is a specialization of GWTShell. This class generally drives the unit tests and is invoked through GWTTestCase.
  4. The JUnitHost service which supports GWTTestCase by answering requests for test cases and accepting test results.
  5. JUnitMessageQueue, which acts as a mediator between JUnitShell and JUnitHost.
  6. JUnitTestCaseStubGenerator, which is a Generator that provides some extra support for running test cases in a browser.

Benchmarking extends JUnit support by:

  1. Introducing a new Benchmark class as a subclass of GWTTestCase. Benchmarks have special behavior in regards to test method setup and teardown, timing, and parameters. These features of the Benchmark class are implemented in a new subclass of JUnitTestCaseStubGenerator, BenchmarkGenerator.
  2. Introducing a new TestResults class which contains trial, timing, and configuration information. Generated subclasses of Benchmarks populate these objects and send them via the JUnitHost RPC service, where they are eventually received by a BenchmarkReport.
  3. Modifying JUnitShell so that it creates a BenchmarkReport and adds the TestResults of each Benchmark execution to the report.
  4. Modifying GWTTestCaseImpl to accept TestResults from Benchmarks.
  5. Creating a ReportViewer - A GWT application that displays the generated XML benchmark reports. The ReportViewer depends on two services:
    • ReportServer RPC service - Serves up reports from a folder.


Benchmark Generation

Benchmark

This base class doesn't contain any new functionality in and of itself. It's simply a marker subclass for the BenchmarkGenerator.

BenchmarkGenerator

The BenchmarkGenerator is the heart of the benchmarking subsystem. It generates the logic which provides the benchmark execution functionality. Rather than dive through the implementation details of the generator, it's more instructive to examine the code it generates. Given the following example test method:

  /**
   * Performs <code>size</code> removes at position, <code>where</code>, on an
   * ArrayList of size, <code>size</code>.
   * @gwt.benchmark.param where = Position.positions2
   * @gwt.benchmark.param size -limit = insertRemoveRange
   */
  public void testArrayListRemoves(Position where, Integer size) {
    removeFromCollection(size, where, list);
  }

The generator would create the following code:

  public void testArrayListRemoves() {
    privateDelayTestFinish(2000);

    final List ranges = Arrays.asList( new Range[]{Position.positions2, insertRemoveRange});
    final PermutationIterator permutationIt = new PermutationIterator(ranges);

    DeferredCommand.addCommand(new IncrementalCommand() {
          public boolean execute() {
            privateDelayTestFinish(10000);
            if (permutationIt.hasNext()) {

              PermutationIterator.Permutation permutation =
                (PermutationIterator.Permutation) permutationIt.next();

              ArrayListAndVectorBenchmark.Position where =
                (ArrayListAndVectorBenchmark.Position) permutation.getValues().get(0);

              Integer size = (Integer) permutation.getValues().get(1);

              double __setupTiming = 0;
              double __testTiming = 0;

              for (int numLoops = 1; true;) {
                long start = System.currentTimeMillis();
                for (int i = 0; i < numLoops; ++i) {
                  beginArrayListRemoves(where, size);
                  __emptyFunc();
                  endArrayListRemoves(where,size);
                }

                long duration = System.currentTimeMillis() - start;
               
                if (duration < 150) {
                  numLoops += numLoops;
                  continue;
                }

                double durationMillis = duration * 1.0;
                double numLoopsAsDouble = numLoops * 1.0;
                __setupTiming = durationMillis / numLoopsAsDouble;
                break;
              }

              for (int numLoops = 1; true;) {
                long start = System.currentTimeMillis();
                for (int i = 0; i < numLoops; ++i) {
                  beginArrayListRemoves(where, size);
                  testArrayListRemoves(where, size);
                  endArrayListRemoves(where,size);
                }

                long duration = System.currentTimeMillis() - start;

                if (duration < 150) {
                  numLoops += numLoops;
                  continue;
                }

                double durationMillis = duration * 1.0;
                double numLoopsAsDouble = numLoops * 1.0;
                __testTiming = durationMillis / numLoopsAsDouble;
                TestResults results = getTestResults();
                Trial trial = new com.google.gwt.junit.client.Trial();
                trial.setRunTimeMillis(__testTiming - __setupTiming);
                java.util.Map variables = trial.getVariables();
                variables.put("where", where.toString());
                variables.put("size", size.toString());
                results.getTrials().add(trial);
                if (numLoops == 1 && duration > 1000) {
                  permutationIt.skipCurrentRange();
                }
                break;
              }
             
              return true;
            }
            privateFinishTest();
            return false;
          }
        });
  }

This code implements the following functionality:

  1. Parametric test methods.

    The generated benchmark executes all permutations of the values provided for each parameter. The values are specified as Ranges in the @gwt.benchmark.param annotation.

  2. Slow script warning avoidance.

    The generator ensures that only one test permutation is executed before yielding back to the event loop. The implementation of this involves GWT's asynchronous test support, the new IncrementalCommand functionality, and a specialized PermutationIterator which turns iterations of the permutations inside-out into an Iterator. Because this functionality uses GWT's asynchronous test support, users can not currently benchmark asynchronous operations.

  3. Removal of lifecycle overhead.

    The generated benchmark determines how much overhead the begin and end methods are introducing and removes it from the total execution time. Lifecycle methods are optional and are only invoked if the user provided them in their Benchmark.

  4. Accuracy in results.

    The generated benchmark iterates over both the lifecycle and test methods in a tight loop until a minimum of 150ms has passed. This ensures that enough time has elapsed that the coarse JavaScript timer (typically 15ms) doesn't have a large impact upon the final results.

  5. Time bounded Ranges.

    Specifying -limit in the metadata for the last parameter causes the BenchmarkGenerator to introduce a time limit on the range of values for that last parameter:

    if (numLoops == 1 && duration > 1000) {
      permutationIt.skipCurrentRange();
    }


    If any particular permutation takes longer than 1000ms to execute, the benchmark skips the rest of the values in the last range.

  6. Results reporting. The results of each permutation are added as a trial on the test results:

    TestResults results = getTestResults();
    Trial trial = new com.google.gwt.junit.client.Trial();
    trial.setRunTimeMillis(__testTiming - __setupTiming);
    java.util.Map variables = trial.getVariables();
    variables.put("where", where.toString());
    variables.put("size", size.toString());
    results.getTrials().add(trial);

Report Viewing

The ReportViewer application is composed of the following pieces:

  1. The ReportViewer GWT client application
  2. The ReportDatabase
  3. The ReportServer RPC service
  4. The ReportImageServer



ReportViewer

This GWT application presents a summary list of all available reports in a folder, and allows the user to select one for detailed viewing. In the detailed view, the application presents the code for each benchmark method along with its results on each browser. The results are displayed in a graph and also in tabular format.

ReportDatabase

The ReportDatabase is the source of benchmark report data. It scans a folder periodically and provides an API to retrieve the available reports:

public List<ReportSummary> getReportSummaries();
public Report getReport(String reportId);

ReportServer

The ReportServer provides an RPC interface for the ReportViewer to retrieve benchmark data. It is mostly a remote wrapper around the ReportDatabase.

public List<ReportSummary> getReportSummaries();
public Report getReport(String reportId);


ReportImageServer

The ReportImageServer provides the graphs for a report. Given a URL identifying a specific benchmark result, the ReportImageServer reads the ReportDatabase, constructs a graph, and writes it to the HTTP stream as a formatted image.

The ReportImageServer accepts URLs in the following format:

/com.google.gwt.junit.viewer.ReportViewer
  /test_images
  /report-12345.xml
  /RemoveCategory
  /com.google.gwt.junit.client.ArrayListAndVectorBenchmark
  /testArrayListRemoves
  /Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050920/

It indicates the report id, the category of the benchmark, the benchmark class, the benchmark method, and the agent for the results.