1 of 15

Firefox AI Runtime

1

1

2 of 15

Tarek Ziadé

Creator of the French Python User Group (Afpy)

Part of the Firefox AI/ML team at Mozilla

Author of some books about Python

Gawel et Tarek - Pycon FR 2009

2

3 of 15

Firefox AI Runtime Goal

Provide an inference API that runs offline, that we can use for our internal use cases, and surface it for web extension developers.

3

4 of 15

Firefox Translations

  • Offline translation in Firefox
  • started in 2019
  • based on Bergamot and Marian NMT - https://browser.mt
  • RNN models trained for language pairs

4

5 of 15

Firefox Translations / Architecture

  • Forks a dedicated inference process
  • Runs Bergamot as WASM
  • Stores the runtime and the models in Remote Settings (~10 to 20MB each)
  • Leverage Gemmology for fast inference

Web Page

Inference Process

bergamot.wasm

Remote Settings

Eng -> FR

Gemmology

avx-vnni

neon i8mm

5

6 of 15

Beyond translation

How can we support more inference tasks ?

  • Describing images → image-to-text
  • Recognizing words → named entities recognition
  • Classifying text → text-classification / sentiment-analysis
  • Semantic search → feature extraction
  • Summarizing → summarization
  • Text to speech → text-to-audio
  • Speech to text → automatic-speech-recognition
  • etc.

Can’t use Bergamot

6

7 of 15

🤗 Transformers.js

  • Javascript port of Hugging Face’s Transformers (Python)
  • Built on the top of Microsoft's ONNX runtime (WASM and WebGPU)
  • Enables using 1000+ models from Hugging Face
  • Provides high level API for pre- and post- processing of data

7

8 of 15

Example

const captioner = await pipeline('image-to-text',

'Xenova/vit-gpt2-image-captioning');

const url = 'https://example.com/cats.jpg';

const output = await captioner(url);

// [{ generated_text: 'a cat laying on a couch with another cat' }]

  1. implements a set of classes per inference type
  2. crawls the Hugging Face model hub
  3. downloads and caches models on disk
  4. runs an ONNX inference session using onnxruntime-web (wasm/webgpu)

8

9 of 15

Transformers.js @ Firefox 133+

  • Added onnxruntime-web as a backend like Bergamot
  • Vendored Transformers.js
  • Custom models cache in IndexedDB (cross-origin)
  • Can download from our Hub or Hugging Face’s
  • Use the inference process too

Web Page

Inference Process

onnx.wasm

Remote Settings

https://model-hub.mozilla.org

bergamot.wasm

9

10 of 15

PDF.js alt-text

10

11 of 15

Enabling the �Mozilla Community

to build AI/ML features in Firefox

Do not zoom on people faces, it’s scary

11

12 of 15

WebExtensions AI API

  • Available in Nightly
  • Preffed-off in Firefox 134
  • Wraps our Firefox AI Runtime
  • Offers high-level API to run inference in the browser with low friction
  • => Enables the web devs community to experiment with inference easily
  • Gives us a way to iterate on an API design for the web

12

13 of 15

WebExtensions AI API

// SVE - Smallest Viable Example

// 1. Create the ML engine.

await browser.trial.ml.createEngine({taskName: "summarization"});

// 2. Call it.

const res = await browser.trial.ml.runEngine({

args: ["This is the text to summarize"],

});

// 3. Display the results.

console.log(res);

13

14 of 15

Demo. Let’s build a web extension.

Internet has always been about cats.

14

15 of 15

Thanks!

15