1 of 36

Shipping Tiny WebAssembly Builds

Alon Zakai / @kripken

Google

2 of 36

Code Size Matters!

In particular on the Web, but also in some other places.

�smaller code

faster download, faster startup,�less mobile data usage

3 of 36

(But Not Always!)

Sometimes code size is negligible compared to other factors, like asset sizes.

Sometimes the magic ability to run an app on the Web at all is worth a large code size (ship a framework, VM, etc.).

4 of 36

WebAssembly: An Opportunity For Small Code!

WebAssembly is a binary format and can be smaller than JavaScript.

WebAssembly is often compiled from statically-typed languages where powerful dead code elimination (DCE) is possible, like C, C++, Rust, and Go.

5 of 36

...And A Risk

WebAssembly is often compiled from languages that were not designed for small binaries (they even have FAQ entries on that), nor for the Web, like… those same C, C++, Rust, and Go.

  • Standard libraries are assumed to exist anyhow
  • Various idiomatic code patterns not designed for size
  • Not designed to take advantage of Web APIs

This talk will describe how to benefit from wasm’s strengths, and try to avoid those risks.

6 of 36

Advice For All Toolchains

7 of 36

1 Slide Of Obvious Stuff

Enable compression on the server!

  • At minimum, use gzip
  • Even better, use Brotli!
  • Supported in essentially all browsers with wasm

Minify your JavaScript too!

8 of 36

General Advice: Run Binaryen’s wasm-opt

wasm-opt optimizes WebAssembly files:

No matter what toolchain emitted your wasm, wasm-opt can help!

large wasm

wasm-opt

small wasm

9 of 36

wasm-opt: Expected Benefit

Shrinks LLVM wasm backend output by around 20% for C, C++, Rust, etc.

Varies by compiler:

  • On Go with gc the benefit is usually smaller
  • On Go with TinyGo it’s usually larger

(Aside from size, wasm-opt often improves speed too.)

10 of 36

wasm-opt: What It Does (1)

One set of optimizations are standard compiler passes, for example:

  • Dead code elimination
  • Constant propagation
  • Inlining
  • etc.

These can help since:

  • Run on final linked wasm, like Link Time Optimization (LTO)
  • They interact with non-standard optimizations (on next slide)

11 of 36

wasm-opt: What It Does (2)

Another set of optimizations are WebAssembly-specific, for example:

  • Local optimizations (CoalesceLocals, SimplifyLocals, etc.)
  • Memory segment optimizations (MemoryPacking)
  • Structured control flow (ReReloop, RemoveUnusedBrs; example on next slide)
  • etc. etc. (68 different passes as of Feb 6 2020)

12 of 36

An example optimization from RemoveUnusedBrs, saves one byte:

A lot of other small things in that pass add up to saving 1.5% on e.g. zlib. And a lot of passes add up to that 20%!

(block $x

(br_if $x (X))

(Y))

(if

(i32.eqz (X))

(Y))

13 of 36

Using wasm-opt

Some tools run wasm-opt for you, like Emscripten, wasm-pack, and AssemblyScript.

If running manually, use something like:

wasm-opt input.wasm -o output.wasm -O

(can also try -Oz, -O3, -O4)

Or, get binary releases from the WebAssembly/binaryen repo, or JS builds from npm install binaryen or the binaryen.js buildbot.

14 of 36

Using wasm-opt On The Web

A work-in-progress 100% clientside way to run wasm-opt is up at

http://wasm-shr.ink

15 of 36

Advanced wasm-opt Usage

Some flags you may want to set:

  • --converge
  • --always-inline-max-function-size & other inlining flags

Some optimizations that cannot be run by default:

  • --ignore-implicit-traps
  • --low-memory-unused (on next 2 slides)

16 of 36

Optimizing A Load Offset..?

This is not safe in general as the add can overflow, but not the offset!

(i32.load

(i32.add

(X)

(i32.const 16)))

(i32.load offset=16

(X))

17 of 36

--low-memory-unused

If a low region of memory is never used, then a small constant offset that overflows to low memory would be invalid anyhow.

Saves 1.8% on e.g. Poppler! (Helps with speed too.)

pointer

invalid address

offset

18 of 36

General Advice: Investigate Your Code

Various tools focus on size profiling of wasm binaries:

  • Bloaty McBloatface
  • Twiggy
  • wasm-opt’s --func-metrics

19 of 36

Advice For Specific Languages & Toolchains

20 of 36

General C/C++

  • If you don’t use C++ exceptions, build with -fno-exceptions
    • Native wasm exceptions will help significantly here!
  • Avoid RTTI if you don’t need it, build with -fno-rtti
  • Careful with templates (but wasm-opt will merge template code if possible)
  • virtual calls may inhibit DCE (but are a tradeoff vs templates)
  • Things like std::vector are fine (with exceptions off), but in general, prefer simple C over C++ standard library
    • E.g. std::iostream uses a lot of filesystem code; printf is better

21 of 36

Use Web APIs directly!

Even better than printf, call a Web API, e.g. using EM_JS:

#include <emscripten.h>

EM_JS(void, log_int, (int value), {

console.log(”log_int:”, value);

});

int main() {

log_int(42);

}

22 of 36

Build with -O3, -Os, or -Oz. Those levels enable the maximum optimizations for size.

  • Meaning is similar to clang: -O3 focuses more on speed, -Os on size, -Oz trades off a lot of speed for smaller size
  • Use them both during compile and link

23 of 36

Emscripten/Binaryen Integration

emcc drives wasm-opt for you, doing all the useful stuff mentioned earlier, automatically!

While doing so it does things like use --low-memory-unused.

Emscripten emits an optimized combination of WebAssembly and JavaScript, which lets it do things like meta-DCE (DCE JavaScript and WebAssembly as a whole).

24 of 36

Example: Minifying Import Names

As with meta-DCE, this requires close coordination between the WebAssembly binary and the JavaScript that provides those imports.

This optimization saves space in both those files!

(import “env”

“glDrawArrays” ..)

(import “a” “q” ..)

25 of 36

malloc() / free()

Emscripten by default uses dlmalloc for malloc/free, which is quite fast.

emmalloc is a compact alternative - about the size - which is often fast enough (unless your app stresses lots of small variable-sized allocations).

-s MALLOC=emmalloc

26 of 36

Emscripten: Miscellaneous

LLVM Link Time Optimizations (LTO) including system libraries: compile with -s WASM_OBJECT_FILES=0 and link with --llvm-lto 1

JavaScript tips:

  • --closure 1
  • If you only run on the Web, do -s ENVIRONMENT=web
  • Look into: MINIMAL_RUNTIME, STANDALONE_WASM, INCOMING_MODULE_JS_API, NO_FILESYSTEM

27 of 36

Rust

Consider optimizing for size and using LLVM LTO:�

[profile.release]

opt-level = 's'

lto = true

Consider using wee_alloc (similar concept as emmalloc).

28 of 36

Rust

Like C++, common things can increase code size due to library support (e.g. format! and to_string), generics may duplicate code, dynamic dispatch may inhibit DCE, etc.

Consider no_std. As the FAQ says,

Using #![no_std] can result in smaller binaries, but will also usually result in substantial changes to the sort of Rust code you’re writing.

29 of 36

Rust: wasm-pack

Helps with Rust => WebAssembly workflows.

Version 0.9.0 has wasm-opt integration!

30 of 36

Go (gc) ships a full runtime. Very powerful when you need 100% compatibility! But it’s around 2MB (in Go 1.13).

TinyGo is a relatively new Go compiler, “for small places” - microcontrollers and WebAssembly. “Hello world” is less than 1K!

31 of 36

As with C++ and Rust, various things may increase code size, e.g., interfaces may inhibit DCE, as may packages like reflect, etc.

Be careful and do size profiling!

32 of 36

A new language designed with WebAssembly and�code size in mind! (just look at the name and logo ;)

  • Has great Binaryen wasm-opt integration, and is easy to get tiny code, e.g.: -O3z --converge --noAssert
  • Pick the right runtime depending on if you need memory management and GC (full / half / stub / none, largest to smallest)
  • --use Math=JSMath uses JavaScript imports for math support instead of compiled libm (but is a bit slower)

33 of 36

The Future

34 of 36

The Big Thing: Indirect Calls

virtual, dynamic dispatch, interfaces, etc.: indirect calls hurt DCE :(

LLVM’s Devirtualization and Control Flow Integrity data can perhaps be encoded with wasm multi-table. Help wanted!

(table 100 funcref)

(elem (i32.const 0)

$a $b $c $d ..)

(table $foo 2 (type $X))

(elem $foo (i32.const 0)

$a $b)

..

35 of 36

Many Small Things: More Optimizations!

Binaryen has a lot of infrastructure to make it easy to write optimizations, like control flow graph analysis, local analysis, etc. Contributions welcome!

I suspect a lot more can be done with toolchain-specific optimizations. Binaryen currently has PostEmscripten and PostAssemblyScript passes that help a lot. Let’s add more!

36 of 36

Thank you!

http://wasm-shr.ink

has a link to these slides