1 of 17

Faster, easier 2D vector rendering

RustWeek • 2025-05-13

Raph Levien • Google Fonts

2 of 17

Limitations of existing Vello

Requires reasonably modern GPU

doesn’t work at all on WebGL

High and unpredictable memory usage

buffers of adequate size must be allocated in advance

Not easy to integrate into existing renderers

rendering done by compute shader
can’t integrate with fragment shaders / existing render pass

Some performance cliffs

GPU hotspot in very zoomed-out case

Compute shader logic is complex

not everyone is a rocket scientist

3 of 17

Sparse strips

4 of 17

Sparse strips

variable width, fixed height (4 or 8 pixels)
efficient representation of rendered path:

modest memory usage
minimal number of primitives to render

efficient computation

areas without coverage aren’t touched
solid interior regions have only per-strip setup cost, no alpha

5 of 17

Sparse strips

Pipeline

stroke expansion (strokes only) -> filled shapes
flattening -> lines
tiling
sort tiles
organize tiles into strips
merge tiles w/ same (x, y) coordinates and render to alpha values
coarse rasterization - generate sequence of drawing commands per wide tile
render sparse strip representation

6 of 17

CPU-driven to GPU-driven spectrum

Fully CPU driven (vello_cpu)

very portable, small simple codebase
superpower: rendering emoji
decent performance through SIMD optimization

CPU/GPU hybrid (vello_hybrid)

geometry & scheduling done on CPU
painting of pixels done in GPU rasterization pipeline
currently rendering of alpha values done SIMD, will move to compute shader

Open research topic: GPU-driven rendering

requires advanced GPU execution model: indirect command encoding? work graphs?

7 of 17

Performance philosophy

Move allocation & scheduling work to CPU

Fully GPU-driven is still a goal but has many practical challenges

Do all per-pixel calculation on GPU
SIMD and multithreading for CPU work
Cache paths & avoid needless work
Schedule GPU work as efficiently as possible

Sparse (avoid unneeded work)
Minimize barriers (exploit maximal parallelism)

Use bounded resources on GPU

8 of 17

Path caching

Previous Vello: dynamic rendering of all paths

but many (most) UI workloads benefit from caching

Generalizes glyph caching
Memory footprint is O(n); dense glyph atlas is O(n^2)

useful visual here: sparse repr of glyph at different sizes

Avoids need for 2D atlas allocation

9 of 17

Sparse strips scale efficiently

10 of 17

Clip optimization

11 of 17

Clip optimization

composite

just draw

no GPU work

12 of 17

Spatio-temporal allocation

abundant memory

tight memory

13 of 17

Exploiting parallelism

Early pipeline stages (up to coarse rasterization)

Fully parallel by draw object

Coarse rasterization

(currently) serial but fast - simple calculation per wide tile

Fine rasterization

Fully parallel by wide tile (CPU rasterization)
GPU accelerated

All CPU stages: lots of SIMD

14 of 17

SIMD

Single Instruction Multiple Data
One CPU instruction handles a vector
Significant speedups from fine-grained parallelism
Speedups for many different operations:

flattening, tiling, alpha rendering, fine rasterization

Neon fp16 is hot

but currently need to write asm

We need better infrastructure in Rust!

https://linebender.org/blog/towards-fearless-simd/

15 of 17

Community

Code base is simpler and more modular than most renderers

more amenable to community contributions

The team:

Taj Pereira, Canva
Alex Gemberg, Canva
Andrew Jakubowicz, Canva
Laurenz Stampfl, ETH Zurich
Tom Churchman
Daniel McNab, Linebender, funded by Google Fonts
Nico Burns, Dioxus/Blitz

Renderer office hours every Wednesday
Linebender is a great community for learning & building

16 of 17

Current status

vello_cpu 0.0.1 on crates.io

imaging model includes gradients, images, clips, blends, blurred rounded rectangles
text: variable fonts, hinting, and both COLRv1 and bitmap emoji
no_std
AnyRender abstraction back-end (used by Blitz)

vello_hybrid in active development

imaging model includes images & clips
also contains WebGL2 back-end (no wgpu dependency)

17 of 17

Roadmap

See roadmap doc
Full imaging model for CPU and hybrid
Set of image filters for both CPU and GPU
Glyph caching
Continued performance work

lots of SIMD - could use better Rust infrastructure

Future work

HDR color
conflation-free compositing