Web Assembly @ BlinkOn 5
Nick Bray
ncbray@google
Native code on the web - today
Apps
Game engines: Unity / Unreal
Emulators: Dosbox, JS Linux, NaCl Development Environment
Extensible Web
Languages: repl.it
PDF Viewing
Media decoding
Web != native
Asynchronous
No threads with shared state
Inconsistent performance
Porting code to the web is painful!
| Year | Secure | Portable | Ephemeral | Cross Browser | Shared Memory |
JavaScript | 1995 | ✓ | ✓ | ✓ | ✓ | x |
NPAPI | 1995 | x | x | x | x | ✓ |
ActiveX | 1996 | x | x | x | x | ✓ |
Flash | 1996 | ~ | ✓ | ✓ | x | x |
Java Applets | 1996 | ~ / x | ✓ | ✓ | x | ✓ |
Native Client | 2008 | ✓ | ~ | ✓ | x | ✓ |
Emscripten | 2010 | ✓ | ✓ | ✓ | ✓ | x |
asm.js | 2013 | ✓ | ✓ | ✓ | ~ | x |
PNaCl | 2013 | ✓ | ✓ | ✓ | x | ✓ |
Web Assembly | 2016 ? | ✓ | ✓ | ✓ | ✓ | ✓ |
WASM
C sources
Compiler
WASM binary
HTML
JavaScript sources
Data files
Browser
DOM
JS VM
WASM VM
… but actually…
C sources
Compiler
WASM binary
HTML
JavaScript sources
Data files
Browser
DOM
JS+WASM VM
WASM Ops
const.int32 7
getlocal a
add.int32 ● ●
call foo ● ●
foo(a+7, b)
getlocal b
Note: can statically infer all types
WASM prototype vs Minified JS
66% smaller uncompressed
26% smaller compressed
23x faster to parse
Early estimates from Mozilla
WASM Memory
0
address_space_max
Load
Store
int8
int16
int32
int64
float32
float64
ptr = malloc(100);
100 bytes
Objects
WASM FFI
// FFI Interface
export int foo(int);
import int bar();
// C compiled into WASM
int foo(num int) {
return num * bar(); // Call into JS
}
// JavaScript
function bar() {
return 3;
}
function run(module) {
var instance = WASM.createInstance(
module,
{bar: bar}
);
return instance.foo(7); // Call into WASM
}
foo (WASM)
run (JS)
bar (JS)
bar()
return 3
return 21
foo(7)
WASM Polyfill
C sources
Compiler
WASM binary
HTML
JavaScript sources
Data files
Browser
DOM
JS VM
Translator
asm.js
JavaScript
sources
Optimistic/Imaginary Roadmap
v1.0 (2016?)
Single thread w/ event loop. Loads fast, runs fast.
v 1.1 (2016?)
Threads! Blocking!
v1.2+ (2017?)
Exceptions, SIMD, dynamic linking, debugging, APIs
Things that keep me up at night.
(Help?)
(And please don’t freak out?)
(Seriously, this is looking far in the future and we’re planning for incremental evolution rather than a “big bang.” Don’t freak out.)
Portable native code
Not worried
But integrating with the browser… oh my.
P0: sync / async mismatch
Main Thread
Event Loop
JS Code
WASM Code
Sync
calls
WASM Thread
WASM Code
WASM Thread
WASM Code
Atomics and Futexes
postMessage?
Atomics?
and Futexes?
Shared Memory
Deadlock when thunking?
~0.25 ms latency
40k message / sec
Multiplexing between events and futex wake?
P0: sync / async mismatch
Main Thread
Event Loop
JS Code
WASM Code
Sync
calls
postMessage
Atomics?
and Futexes?
Shared Memory
Atomics and Futexes
Worker
Event Loop
JS Code
WASM Code
Sync
calls
How do we stop the world?
P0: memory limits on mobile
OS + Browser + Web content + JS + WASM
Large contiguous chunk vs. address space fragmentation?
Will realloc work?
Physical memory limits
Simple experiments: 256 MB on 32-bit / 512 MB on 64-bit
P1: exposing memory
WASM address space can move and have holes
How is it exposed to…
… JavaScript?
… Blink?
P2: caching behavior
Big binaries
Diffs and patches?
Caching compiled code
Asset groups / coordinated eviction
P1: WASM APIs
“Web APIs”
Thread safe
Zero copy
mmap
Idiomatic / blocking
No GC or JS needed
… but Web IDL is not language neutral
any getParameter(GLenum pname);
interface WebGLRenderingContextBase {� ...
const GLenum TRIANGLES = 0x0004;
...
};
interface ImageData {� readonly attribute unsigned long width;� readonly attribute unsigned long height;� readonly attribute Uint8ClampedArray data;�};
… and callbacks are (very) problematic
callback EventHandlerNonNull = any (Event event);
So... how does WASM get the APIs it needs?
Blink 5 years from now (if WASM succeeds)
Threading and blocking
Non-JS language bindings
Larger content
Extensible web: implement features in WASM?
Questions?
Backup Slides
WASM Pollyfill
function _add($a, $b) {
$a = $a | 0;
$b = $b | 0;
return (HEAP32[$b >> 2] | 0) + (HEAP32[$a >> 2] | 0) | 0;
}
int add(int *a, int *b) {
return *a + *b;
}
int32 add(int32 a, int32 b) {
return(add.int32(load.i32(getlocal(a)), load.i32(getlocal(b))))
}
… as opposed the the status quo.
call foo t1 c _
foo(a+7, b)
~ is simplified ~
t0 = 7
t1 = a+t0
foo(t1, c)
add.int32 a t0 t1
const.int32 7 t0
… but this is really about control flow.
while (a < 10) {
a = a + 1;
}
loop ●
block 2 ● ●
if ● ●
ge.int32 ● ●
getlocal a
const.int32 10
break 1
setlocal a ●
add.int32 ● ●
getlocal a
const.int32 1
… as opposed the the status quo.
add.int32 a t0 a
const.int32 1 t0
lt.int32 a t0 t1
const.int32 10 t0
while (a < 10) {
a = a + 1;
}
t1
T
F