1 of 37

VizGen:

Accelerating Visual Computing Prototypes in Dynamic Languages

Yuting Yang, Sam Prestwood, Connelly Barnes�

University of Virginia

2 of 37

Motivation

3 of 37

Motivation

  • Fast programs are hard to write.

    • Programs can be optimized manually by a domain expert (hand tuned C++ or assembler).
    • Domain-specific language such as Halide can be used for certain purposes.

4 of 37

Motivation

  • Easy-to-write programs are usually slow.

    • “Dynamic” languages are those which have a flexible runtime model, dynamic typing and flexible array operations.

5 of 37

Motivation

6 of 37

Motivation

  • Python code

7 of 37

Motivation

  • C code

8 of 37

Motivation

  • Easy-to-write programs are usually slow.

    • “Dynamic” languages are those which have a flexible runtime model, dynamic typing, and flexible array operations.
    • Resulting programs are typically slow even with the help of just-in-time (JIT) compilers.

9 of 37

Motivation

  • Our goal:

    • Automatically translate visual computing prototypes in dynamic languages (e.g. Python) into highly-performant code.

10 of 37

Architecture

11 of 37

Architecture

Static Analysis:

Collect program information

Program transformations:

Apply optimizations

Autotuner:

Choose best variant

Python

C

12 of 37

Static analysis

13 of 37

Static analysis

Type and size inference

Parallelism analysis

Preallocation analysis

14 of 37

Type and size inference

  • Dependent type

    • A type whose definition depends on a value.
    • A “pair of integers” is a type.
    • A “pair of integers where the second is greater than the first” is a dependent type.

15 of 37

Type and size inference

  • Problem:

    • Dynamic languages tend to represent arrays using a generic multidimensional array type.
    • Generic array type is flexible and easy to generalize, but is usually slow in run-time.

16 of 37

Type and size inference

  • Method:

    • Analyze dependent types throughout the program, and find constant size arrays.
    • Tracks both the type and value for each function call arguments.

17 of 37

Type and size inference

  • Example

out_img = numpy.ones([3, 3])

Known value

Dependent type with�known shape (3x3)

18 of 37

Type and size inference

  • Other related optimizations

    • Create function variants with specific type signature.

19 of 37

Type and size inference

  • Other related optimizations

    • Create function variants with specific type signature.
    • Rewrite common API calls for arguments of known type and size (e.g. dot product of length 3 vectors) into efficient C implementations.

20 of 37

Program transformations

21 of 37

Program transformations

Type specialization

Parallelize loop

Array storage alteration

Loop implicit variables

Preallocate arrays

API call rewriting

Remove loop conditionals

Vectorize innermost

22 of 37

Loop implicit variables

X

X*X

Y

X

Y

Y = X * X + 1

Normal computation order

Loop implicit variables

23 of 37

Vectorize innermost

X

Y

Z

Vectorize Innermost

X

Y

Z

Z = X + Y

Normal computation order

24 of 37

Array storage alteration

Unused padding

Float 64

Float 32

25 of 37

Remove loop conditionals

SLOW

FAST

26 of 37

Autotuner

27 of 37

Autotuner

  • Challenge:
    • It is hard to decide if a certain transformation will result actual speedup (e.g. parallelism).

  • Method
    • Start with good initial guesses.
    • Use genetic algorithms to search for the best variant.

28 of 37

Autotuner

  • Example on good initial guesses

    • Parallelize all the outermost for loops.
    • Apply type specialization everywhere.
    • Vectorize innermost loops if applicable.

29 of 37

Evaluation

30 of 37

Test suite

Bilateral grid

Camera pipeline

Interpolate

Composite

Harris corner

Local Laplacian

Mandelbrot

One stage blur

Optical flow

Pac-man

Raytracer

Two stage blur

31 of 37

Evaluation

  • VizGen’s median speedup compared to Python, and Python compilers / JITs

Python

Numba

PyPy

Pythran

unPython*

1816x

38x

937x

7.1x

354x

32 of 37

Evaluation

  • VizGen’s median comparison with C

Speedup with C

Code shorter than C

2.3x

2.3x

33 of 37

34 of 37

35 of 37

36 of 37

Conclusion

  • For 12 applications, we showed our compiler can translate naively written visual programs in Python to be faster (2x) and shorter (2x) on average than naively written C equivalents.
  • Open source project (MIT license):�http://goo.gl/iXk08A
  • We plan to continue development and welcome contributors and users.

37 of 37

THANK YOU.

Yuting Yang

yyuting@virginia.edu

Sam Prestwood

swp2sf@virginia.edu

Connelly Barnes

connelly@cs.virginia.edu