WebNN Interop Investigation
Ningxin Hu
9/17/2019
Recap: custom op (#6)
WebNN-WebGPU Interop
Example: Conv + Add + Relu by TF.js WebGPU
tf.setBackend('webgpu');
// Prepare tensors
const convInput = tf.tensor(inputData, inputDims);
const filter = tf.tensor(filterData, filterDims);
const bias = tf.tensor(biasData, biasDims);
// Execute conv, add bias and relu by TF.js WebGPU backend
const convOutput = tf.conv2d(convInput, filter, 1, ‘same’);
const addOutput = tf.add(convOutput, bias);
const reluOutput = tf.relu(addOutput);
const resultData = await reluOutput.data();
Example: compile WebNN op for WebGPU device
// Create a Model object contains a Conv op
const conv = createWebNNConvOp(inputDims, filterData, filterDims);
// Use TF.js WebGPU backend
tf.setBackend('webgpu');
// Create a Compilation object for the constructed model that contains built-in op.
const conv_compilation = await conv.createCompilation();
// Get the GPUDevice of WebGPUBackend and set that as WebNN compilation target.
conv_compilation.setGPUDevice(tf.backend().device);
// Finish the compilation.
await conv_compilation.finish();
// Create an Execution object for the compiled model.
const conv_execution = await conv_compilation.createExecution();
Example: execute WebNN’s op with WebGPU op
// Create input and output tensors by TF.js WebGPU backend
const convInput = tf.tensor(inputData, inputDims);
const convOutput = tf.zeros(outputDims);
const bias = tf.tensor(biasData, biasDims);
// Execute WebNN conv op
conv_execution.setInput(0, tensorToGPUBuffer(convInput));
conv_execution.setOutput(0, tensorToGPUBuffer(convOutput));
conv_execution.startCompute();
// Execute add bias and relu by TF.js WebGPU backend
const addOutput = tf.add(convOutput, bias);
const reluOutput = tf.relu(addOutput);
const resultData = await reluOutput.data();
Example: execute WebNN’s fused op
// Create WebNN Conv op with bias and fused relu
// Create input and output tensors by TF.js WebGPU backend
const convInput = tf.tensor(inputData, inputDims);
const convOutput = tf.zeros(outputDims);
// Execute WebNN conv op
fused_conv_execution.setInput(0, tensorToGPUBuffer(convInput));
fused_conv_execution.setOutput(0, tensorToGPUBuffer(convOutput));
fused_conv_execution.startCompute();
const resultData = await convOutput.data();
Demo
POC Implementation on MPS
Performance Summary
Test | Inference time (ms) |
WebGPU conv/add/relu | 61.31 |
WebNN conv interops with WebGPU add/relu via ArrayBuffer | 43.42 |
WebNN conv interops with WebGPU add/relu via WebGPUBuffer | 23.06 |
WebNN conv with fused add/relu | 21.25 |
Copying/Reordering Optimization
Test | Inference time (ms) |
WebGPU conv x2 | 112.96 |
WebNN conv + WebGPU conv | 67.33 |
WebNN conv x2 with reordering | 24.53 |
WebNN conv x2 without reordering | 23.01 |
WebNN-WASM Interop
WASM ops and WebNN graph execution
WASM
op
WebNN
ops
WASM
op
WASM Heap
Tensor 0
Tensor 1
Tensor 2
Tensor 3
ArrayBufferView
Workload: MobileNet V1
12x
[Depthwise+Conv2D]
The chart is based on WASM ops execution
Graph partition configurations
Conv2D
Conv2D+
DepthwiseConv2D
All (one graph per op)
WebNN Graph
Results: performance
Device: Pixel 3, Android 9, updated on 12/2018, Chromium 70.0.3503
Device configuration: XPS 13 Laptop, CPU: Intel i5-8250U, Ubuntu Linux 16.04, Chromium 70.0.3503
Summary of WebNN-WASM interop
Proposal