A vector format for Flutter (and beyond)

2019-2021 - Ian Hickson - Flutter team - flutter.dev/go/vector-graphics

This document is publicly shared. Please feel free to add comments.

Current status: No work is currently underway. Until compute shaders are available more broadly, it would not make sense to deploy a format that is disproportionately slower on slower devices, and existing ideas for designs that don't use compute shaders aren't sufficiently better than existing deployed formats to be worth the added complexity. That said, please don't hesitate to leave a comment if you have new ideas that are consistent with the priorities below; it's important to keep an open mind when designing new formats! -Hixie, August 2023.

Introduction

Supporting a vector graphics format is a popular and long-standing request from the Flutter community[1]. There are no good formats available. This document discusses the requirements and priorities for creating new formats, and ends with some straw-man proposals for a potential new industry-wide vector graphics standard.

Table of contents

Introduction        1

Table of contents        1

Terminology and typographic conventions        1

Scope        2

Target audience        2

Use cases        3

Use cases deemed out of scope        5

User stories        6

User interface components (especially icons)        6

Featured graphics        6

Geographic maps        7

Existing formats        7

SVG        7

PDF        7

PostScript        7

Lottie        8

Open Font Format (OFF)        8

VectorDrawable        8

Rive        8

HVIF        8

IconVG        8

TinyVG        8

Priorities        9

Optimizing for the authoring experience        9

Hand authoring of images        9

Tool-driven authoring of images (round-tripping in editors)        10

Optimizing for external constraints        10

Security        10

Accessibility        11

Indexability        11

Disk and network footprint (compressed size)        11

Flutter suitability        12

Backwards compatibility        12

Forwards compatibility        12

Optimizing for the renderer        12

Memory footprint        13

Footprint of the renderer itself        13

Speed of rendering the full image        13

Speed of rendering subsets of the image (cropping at rendering time)        13

Speed of rendering images at small sizes        13

Power requirements of rendering the image        14

Optimizing for quality of rendering        14

Hinting        14

Level of detail        14

Summary 🧚        14

Features        15

Metadata        15

Image dimensions        15

Baseline        15

Text 🚫        15

Text styling features        16

Span styling        17

Font fallback        17

Paragraph layout        17

Internationalization        17

Vertical text        17

Text along a path or along a mesh        17

Localization        17

Parameters ✅        18

Constants        18

Predefined parameters 🚫        18

Animation        19

Multi-stage animations 🚫        19

Versatility of rendering at different sizes        19

Types of parameters        20

Expressions ✅        20

Arithmetic        20

Conditional        21

Properties of values        21

Interpolation        21

Shapes        21

Types of paths        21

Fills ✅        22

Strokes 🚫        22

Animations        23

Hairline strokes        23

Pixel snapping, shape alignment 🚫        23

Paints        23

Colors ✅        23

Color spaces        23

High dynamic range        23

Gradients ✅        24

Textures 🚫        24

Shaders 🚫        24

Transforms at the shape level ✅        24

Group transforms, group opacity, effect layers, and clips 🚫        25

Injected widgets 🚫        25

Back references ✅        25

Hit testing ✅        25

Turing completeness        25

Clipping at the image edge        26

Design ideas        26

Culling        26

Designing for GPUs        27

Traditional GPUs        27

Animations        27

General-purpose GPUs        28

Future GPUs        28

Discussion        29

Ideas from other formats        29

Avoiding antialiasing seams 🚫        29

Multiple formats        29

Using fixed-point coordinates 🚫        30

Design decision summary        30

Evaluating existing designs        31

Evaluating new designs        31

Compatibility with authoring tools        31

Strawman designs        31

Icon VG        31

Open Font Format        31

A fixed-alignment binary format        31

Introduction        32

Conventions        36

Structure        36

Header block        36

Metadata blocks        37

Parameter blocks        37

Expression blocks        38

Matrix blocks        42

Shapes        43

Cubic Béziers        43

Rational Quadratic Béziers        43

Curve blocks        43

Shape blocks        46

Gradient blocks        47

Paint blocks        48

Flat color        49

Linear gradient        49

Radial gradient        50

Flags        50

Paint codes        50

Composition blocks        51

APIs        52

Updating parameters        52

Hit testing        52

Bounds introspection        52

Metadata APIs        53

Other APIs        53

Compressibility        53

Summary of internal references        53

Future extensions        54

New block types        54

Bitmap images and other attachments        54

Extending packed blocks        54

More parameters, expressions, and paints        55

More kinds of references        55

Terminology and typographic conventions

Terms in this document are generally used according to their usual meaning in the industry. The following words in particular, however, are used herein to refer to concepts with specific meanings. To avoid ambiguity where those terms may in other contexts sometimes be used to refer to other concepts, they are defined explicitly here.

shape. a piece of geometry or other rendered component of the graphic, which could be as simple as a circle or path, but could also be complex, such as a paragraph of styled text or a composite layer (consisting of other shapes and instructions for blending them as a whole into the scene).

🚫  indicates a feature that is currently not planned for inclusion.

✅  indicates a feature that is intended to be supported.

🧚  indicates a decision (more subtle than just a "yes" or "no").

🤔  indicates a feature where more thought is required to make a determination.

Scope

Target audience

While Flutter's use cases (as described below) are motivating, a format will be more useful if it has broad appeal. As such, it is the goal of this document to describe a format suitable for implementation in web browsers and other user interface frameworks. A format will also be significantly more useful if it is implemented as an export format in common vector graphic editors, so that should also be a goal.

Use cases

These are the use cases that have been discussed for which vector graphics would be useful in Flutter:

  • Icons. Static images that may need to be recoloured dynamically (e.g. grayed out when disabled), and may need to have varying levels of detail depending on the target render size. Unlikely to have text.
  • Animated icons. Could be two icons as described above, and a description of how to seamlessly transition from one to another; or it could be a single icon that has a way to react to being touched. The Material Design guidelines have a variety of examples[2]. These are unlikely to have text.
  • Backdrops. Images that may be much larger than the screen, maybe shown with parallax; generally simple scenes with shapes, gradients, and shadows, unlikely to have text.
  • Featured graphics. Images that form the foreground of the application, for example for tutorials, or to report state or suggest the user perform some action. Typically animated. For example, an image showing how to plug a peripheral into the host device. May have text.
  • Skeuomorphic UI widgets. For example, for knobs or sliders. Some variants are unlikely to have text (e.g. knobs), some are likely to have text (e.g. push buttons with labels integrated into the picture).
  • Skeuomorphic UI regions. For example, a tutorial for some machinery's control panel. This would involve being able to hit-test on components. May have text.

Anecdotally, a very common source of animations is Adobe After Effects.

Use cases deemed out of scope

The following use case has also been discussed, but is not something for which we are going to actively optimize:

  • Maps. Images that can be panned and zoomed several orders of magnitude, with varying detail at different zoom levels. Likely to have text. This is considered out of scope because in practice applications (or libraries) that show maps are likely to be focused on this topic enough that they can afford to provide bespoke solutions.

User stories

Let us study some of these use cases in more detail.

User interface components (especially icons)

It is common for visual user interfaces to contain graphical components (indeed, that is pretty much all that they contain other than text). These tend to fall into one of two categories: graphics that are drawn by the application (or its framework) directly, such as window outlines, background colors and gradients, even checkboxes or radio buttons; and more elaborate images as used in icons.

Icons can be represented by bitmaps, but this tends to fail on modern hardware if one is looking for high-quality imagery as different devices have different device pixel ratios (also known as resolution or pixel density). One could downsample from a much larger image, but this requires a large image (affecting app size) and lots of memory or computation time during decoding. One could ship many different versions, but this also ends up having an unfortunate impact on app size and not every possible resolution can be provided (since the user can, on some systems, select an arbitrary value).

Thus developers tend to gravitate towards vectors as a solution for showing icons. Vectors tend to be a good choice since icons typically have clean lines and a low level of detail.

Developers tend to very quickly find that static icons are insufficient. Modern user experiences are rife with transitions and animations. It is no longer sufficient for a "play" icon to change from a triangle to a square when tapped; instead it must morph from one to the other. It is no longer sufficient for a dial to merely rotate when its value is changed; instead the reflection and shadows represented in the image must maintain a consistent illusion throughout.

As such, whatever format applications use needs to support some level of animation.

Featured graphics

Some applications are focused around specific images. For example, an education application could show a diagram of an animal's anatomy or a photograph of a historical event.

These graphics tend to fall into two categories: the photorealistic, for which bitmaps are the only practical solution, and the more abstract, such as diagrams, for which vectors are the preferred solution due to their smaller size and resolution independence.

As with user interface components, one could use bitmaps for the diagram case. However, doing so quickly runs into issues with memory and disk consumption issues (these images are by necessity large to be beautiful even at high pixel densities on large displays).

Geographic maps[3]

Applications that show geographic maps, such as Google Maps, operate in a vector space. Roads are lines that intersect, labels are placed upon those lines, buildings may be represented by paths. However, a vector image is insufficient to properly represent this data. The actual map data is not Cartesian (it's the surface of a sphere, or potentially an even more complicated shape), the layers are too numerous (and include petabytes of satellite image data), the logic for showing or hiding labels depends on features such as the identity of the user seeing the map (e.g. consider "home" and "work" labels, or highlighting recent destinations), not to mention the need to directly transition from 2D views to 3D views (zooming out to see the whole planet, zooming in to see a 3D view of a building interior).

These features all result in a desire from map application creators to carefully control their rendering surface and as such providing these features in a vector graphics format would not actually help their use case.

Existing formats

The de-facto standards for vector graphics are SVG[4] and PDF[5]. Most other formats are either proprietary[6], or hail from a different era (e.g. designed in the 1990s) and thus not well-suited to modern needs.

SVG

SVG is really an application SDK that happens to include vector graphics (e.g. fully supporting SVG involves supporting XML, JS, DOM, SMIL, HTML, audio playback, keyboard, mouse, and touch input, form controls, HTTP submission, video conferencing, etc). Additionally, there is no clearly defined subset of SVG to target if one only wants "vector graphics": the modern version of SVG Tiny involves supporting JavaScript and video playback, while the original version of SVG Tiny requires an unusual subset of features, for example it does not support gradients, but does support custom fonts.

PDF

Similarly, PDF has developed into an electronic document exchange format that happens to include vector graphics (e.g. fully supporting PDF involves supporting multipage documents, form controls, video, 3D, digital signatures, etc).

PostScript

PDF is built on PostScript, which is also the basis of EPS. PostScript is a programming language designed in 1984; vector graphics are the output of the program. EPS could be used as a vector graphics format more easily than PDF; implementing support for EPS does not, for instance, involve supporting digital signatures. However, PostScript, and thus EPS, are intended primarily for printing. A variant of PostScript called Display PostScript (DPS) was designed to make PostScript more suitable for use in user interfaces[7]. DPS is the most plausible existing format that could be used to address the use cases described above. Even DPS, however, comes with significant baggage, for example a garbage collection model, the ability to specify the halftone phase, and a specific set of fonts.

Lottie

There is one recent addition to this space that has gained some traction, namely Lottie[8]. This is a format created by Airbnb specifically for the purpose of allowing Adobe After Effects assets to be rendered on the Web, Android, iOS, and (increasingly) other platforms. It is based on JSON. There is currently no first-class Lottie support in Flutter, although adding such support has been discussed[9] and there are packages that allow Lottie to be used with Flutter in various ways.

Lottie suffers from being very specific to After Effects. Many of its design decisions are, in the abstract, esoteric, and not what you would want from a format designed from the ground up. There's also no specification for the format, currently[10].

Open Font Format (OFF)

OFF (also known as OpenType), as a font format, is effectively a vector graphics format. It may be one of the vector graphics formats most used by the general population, in fact. Supporting Emoji has led to this format adding support for color, and there are proposals to extend it to support gradients[11].

VectorDrawable

Android's Vector Drawable is spiritually a simpler version of SVG (XML-based, similar path data format).

Rive

Rive's format[12] is designed specifically for Flutter but is also proprietary and is largely undocumented[13].

HVIF

The Haiku open source project's vector format, HVIF, is optimized for icons. Notable features include a level-of-detail system that is extensively used by Haiku's icon set. While it would be an interesting choice for a subset of the use cases described above, some of the design choices limit its use. For example, there is a small maximum number of styles and paths per file. It also seems to lack a formal specification.

IconVG

On the simpler end of the spectrum, IconVG[14] is an experimental format that could address some of the needs described in this document. Currently its focus is a little unclear[15]. Similar to HVIF, its design places a small file size higher in the list of priorities than this document argues is appropriate (as discussed below).

TinyVG

Similar to IconVG, TinyVG[16] is a recent contender in the vector graphics space. TinyVG's focus, as its name might suggest, is file size; it is a binary format that results in significant reductions over SVG (comparable to using compression algorithms on SVG source files). As with HVIF and IconVG, TinyVG's priorities don't align with those described below.

Priorities

In creating a new format, one must first decide what one is optimizing for. Here are some options, many of which are, to some extent or another, mutually exclusive, and all of which are relevant to today's market:

Optimizing for the authoring experience

Naturally, any format, to be useful, must be supported as an export format from major authoring tools such as Adobe Illustrator[17]. Similarly, any successful format is going to need tools to convert to and from the format and today's widely used formats like SVG.

Hand authoring of images

One could imagine creating a graphics format optimized for hand-authoring. While graphics are usually edited using a WYSIWYG graphical editor, hand authoring is useful when creating series of similar images[18], or when creating diagrams or other images where precision is more important than aesthetics.

A hand-authored format is also easier to test than a binary format, since it is easier to create content for such a format.

Such a format would be text-based, maybe based on XML, JSON, or some other commonly-understood metalanguage, and would probably focus on features to allow styles to be reused, coordinates to be given relative to other coordinates, and generally may support many ways to express the same core concepts, such as having colors expressible either by name, or by decimal RGB values, or hex RGB values.

In many ways, SVG fits this description[19]. It isn't clear what new value would be brought to the table by creating a new format that is so close to an existing one.

Furthermore, if we build a format that is not optimized for authoring, it would be a simple matter to create a hand-authoring-optimized variant of that format along with a tool that converts files from one format to the other.

The reverse is not necessarily true: a format optimized for hand authoring may not be easily converted into a format optimized for other concerns, such as low memory usage at render time or fast rendering. For example, a format optimized for low memory usage would probably avoid creating an object model that can be manipulated by a script during the rendering; if we create a vector format that handles animation by running a script each frame that can manipulate an object model, then it may be difficult to faithfully "compile" it to a memory-efficient memory format[20].

For these reasons, we will not focus on a format optimized for hand authoring, though we will keep in mind the ability to provide a corresponding hand-authoring format and a tool to convert between the two formats. 🧚

Tool-driven authoring of images (round-tripping in editors)

Vector graphics editors can typically export to SVG or PDF, but even those that use SVG or PDF as their native format require extensions to exactly represent their internal state, and these extensions vary from editor to editor[21]. This is natural, as different editors have different UIs and thus different state. Indeed, many vector graphic editors have a dedicated format that they use to represent their internal state[22].

This also means, however, that there is little sense in creating a general format for editors: each editor has its own needs. 🧚

Optimizing for external constraints

If one does not optimize for authoring (whether by hand or by tool), one can optimize instead for aspects of the file itself.

Security

A format can be optimized for security, meaning that it is designed to not contain features that are insecure as designed, and meaning that it is unlikely for renderers to accidentally implement features in an insecure way.

For example, a format not optimized for security could contain x86 code that is intended to run directly on the CPU. This may be good for performance, but would be terrible for security.

A format not optimized for security could also do things like have two ways to mark the sizes of data buffers, e.g. having a length field as well as a redundant end-of-block sentinel. This could lead to renderers allocating a buffer using the length field but then writing data to the buffer until the end-of-block sentinel is reached, allowing for buffer overrun situations when the file is maliciously crafted.

For obvious reasons, we should optimize for security.

The threat model used to evaluate this proposal assumes that graphics are represented as a single file, which is entirely under the control of an attacker, and which is being rendered by software running with the privileges of a potential victim user. All of the following are considered security flaws: exfiltrating data back to the attacker, effecting a change to the system configuration or any user data, affecting the display in any manner outside of the region to which the graphic image is being rendered (including violating expectations regarding the order of painting, e.g. being able to paint over a window that itself should be covering the image), consuming hardware resources disproportionate to the complexity of the image being rendered.

Accessibility

A vector format could be optimized for accessibility above all else: for example, every shape could be required to have a description, images could be required to have multiple palettes to handle color blindness, and text in images could be required to be at a minimum font size.

In practice, we know from experience (e.g. with SVG, which has the <title> and <desc> elements to describe any arbitrary shape) that authors typically do not make any attempt to make their images accessible even when text description features are available[23], relying instead on the host environment to provide accessibility affordances (e.g. alt="" attributes on <img> elements in HTML, when they point to SVG images).

Flutter provides text description accessibility affordances for bitmap images already; we can rely on those for vector graphics as well, and therefore ignore that issue for the vector graphics format itself.

The other issues are harder to ignore. Text size scaling may make sense, for instance, as might allowing control over the colors to allow for adjustments for color blindness. That said, it's not critical for the format itself to have this built in. While we may wish to allow it, these use cases could be equally handled by merely providing multiple images. For this reason, this is probably a low priority for the format (while it naturally remains a high priority for the platform as a whole).

Indexability

The ability for search engines to find images. With modern search engines able to evaluate code, use ML models to recognize images, and perform OCR to find text, it's hard to imagine a format that would make this especially difficult, but it's worth considering.

One step in this direction might be to ensure that text is available in an easily accessible form, for example having a string resources section if the overall format is binary.

Disk and network footprint (compressed size)

One obvious factor to optimize for is the size of the file.

In practice, even a format like SVG, for which very little attempt has been made to really optimize for size, can describe images of relatively modest complexity with relatively modest sizes[24], and that's before compression.

For this reason, it's not clear that optimizing for disk or network footprint first is especially valuable. Naturally, once a focus is established, decisions can be made with a bias towards minimizing the compressed size footprint.

Flutter suitability

We can optimize for use in Flutter, or we can make the format more general.

For example, we can design the format to fit the instantiateImageCodec API[25] rather than requiring that vector graphics in this format use an entirely different codepath than bitmap images (as is currently required for SVG in Flutter). We can use straight colors rather than using colors with a premultiplied alpha channel.

In practice, there is a limit to how much we can really optimize for Flutter above other potential hosts of vector graphics, because Flutter is really just a Dart binding of the Skia API. Anything we do to optimize for Flutter is really optimizing for Skia, which is also used by, e.g., Android, Chrome and Firefox.

There's also the possibility that optimizing for Flutter may involve changing Flutter, e.g. if our vector format supports adjusting parameters on the fly, or animation along multiple axes, etc, then targeting the instantiateImageCodec API may not be desirable.

Overall, optimizing for Flutter is a logical choice with little likely downside.

Backwards compatibility

A format can be optimized for backwards compatibility, that is, the ability for a renderer of a later version of the format to render images that were written for an earlier version of the format.

Lacking backwards compatibility is a non-starter. If we try to release a renderer that cannot render existing files that are supposedly of the same format, people will describe that as a serious regression. Therefore, this is key to any design.

In practice, backwards compatibility at the format level is easy to achieve. It merely requires that revisions not involve renaming or renumbering features from earlier versions, or otherwise causing the semantics of existing files to change.

Forwards compatibility

A format can also be optimized for forwards compatibility, that is, the ability to render images that use a later version of the format using a renderer written for an earlier version of the format.

This is less critical than backwards compatibility.

Forwards compatibility is also easy to achieve. It requires defining error handling behavior (which should be done anyway for security), and defining enough of the error handling to be non-fatal that extensions (features in newer versions of the format) can be "smuggled" into files (from the perspective of older renderers).

It makes sense to design for forwards compatibility where this does not conflict with more important priorities. (An example of where we would forego forward compatibility could be if we found a security problem in the format itself, which required a change that older renderers could not handle.)

Optimizing for the renderer

We can choose between the usual time/space tradeoffs, with various variants.

Memory footprint

The first possible aspect to optimize for is memory usage during rendering. This comes in various forms, including the steady-state cost once the image is loaded, the peak cost as the image is being parsed, and the cost of merely loading the raw data into memory to parse it in the first place.

In practice, memory is limited but not exiguous in the environments that Flutter is used in. Even the smallest of devices we might one day target (e.g. an Android Wear watch) have at least 512MB of RAM[26]. We generally accept a modestly bigger memory (or disk) footprint an acceptable price to pay for improved performance[27].

Therefore, as with optimizing disk or network footprint, this is probably something that is best considered as a secondary concern: something that we bias towards, but only after having first optimized for something more important.

Footprint of the renderer itself

The design of the language impacts the code size of the renderer itself. For example, an SVG renderer must include an XML parser, which is a significant amount of code in its own right.

Flutter has footprint constraints in various environments (for example, there is a fervent desire for Flutter's overhead on Android to be an order of magnitude smaller), so we should attempt to minimize the disk footprint of the implementation of any vector format that we eventually hope will be implemented by Flutter's runtime.

That said, Flutter will usually accept a greater renderer footprint if it allows greater rendering speed. Processing cycles are much more scarce than disk and network bandwidth.

Speed of rendering the full image

Flutter optimizes heavily for rendering performance in other aspects of its design, because rendering performance is one of the corollaries of our main value[28]. Optimizing for rendering performance in the context of vector graphics is a logical continuation of this.

It is common for animated images to be used in large numbers[29]. We should make sure that any format we design can handle animating many images simultaneously without skipping frames.

Speed of rendering subsets of the image (cropping at rendering time)

A specific aspect of rendering performance that we can optimize for, given the use cases described above, is that of rendering subsets of an image, as in when the image is being cropped (e.g. due to it being panned and zoomed).

As only a subset of the use cases require this feature, it makes sense to correspondingly prioritize this aspect below some of the others. (Anecdotally, this does not seem like a widely-needed feature.)

Speed of rendering images at small sizes

The more images are on the screen, the smaller they typically are. If an image with a lot of complexity is being drawn at a small size, we may be able to get away with only rendering the larger shapes -- for example, if a shape is less than a tenth of a hardware pixel in size, then it isn't likely to really matter. This can help with the total cost of rendering all the images, as adding more images (and correspondingly shrinking them) could end up increasing the overall cost sub-linearly.

By carefully designing the format to allow us to skip small parts when they are so small that they don't matter, we could reduce the cost of rendering images at small sizes. Anecdotally, this feature does seem to have some use, especially in icons, where small details are omitted entirely at smaller sizes in order to keep the icons looking simple and recognizable rather than cluttering them with detail that may be desired at higher sizes to give images more texture.

Power requirements of rendering the image

This is essentially the same as the speed of rendering the full image, especially in the context of multiple parallel animations of many images (where optimizing for reducing the total incremental cost on each frame is as important as the cost of rendering the image in the first place). On mobile devices battery usage is sometimes the driving motivation behind the same optimizations that would be made for improving the overall rendering speed.

Optimizing for quality of rendering

Hinting

Fonts have historically supported means to adjust glyph shapes based on size, a process known as hinting[30], to improve legibility (especially at small font sizes). Icons are often rasterized for use in applications to enable designers to "touch up" the images in a way that they could not achieve with pure vector graphics (even if the originals are vector graphics).

This suggests that the format could provide features for such adjustments. Such features vary from the relatively straight-forward (such as pixel snapping) to the relatively complicated (e.g. allowing shape positioning to be relative to other shapes, e.g. to force at least one device pixel to exist between two shapes regardless of render size).

The trade-off here is between renderer complexity and rendering quality.

Level of detail

Similar to hinting, but more coarse, is the option to entirely omit sections at certain sizes. This is closely related to features for optimizing the rendering speed at small sizes.

This would be simpler than the more elaborate hinting features. The trade-off here is between format complexity and rendering quality.

Summary 🧚

Based on the discussion above, our priorities for this format are, in order of importance:

  1. Can be supported as an export format from authoring tools (most important).
  2. Security.
  3. Backwards compatibility.
  4. Rendering speed of the full image (and power requirements).
  5. Flutter suitability.
  6. Forwards compatibility.
  7. Rendering speed of the image when rendered at small sizes.
  8. Rendering quality.
  9. Rendering speed of subparts of the image.
  10. Disk footprint of the renderer.
  11. Disk footprint of the image.
  12. Memory footprint.
  13. Ability to create a corresponding hand-editable format.
  14. Accessibility[31] and indexability.

Features

To handle the use cases listed earlier in this document, there are some features that would be particularly helpful. This section discusses possible features that we could include in the format beyond the obvious ones such as "circle" or "fill path with color".

Metadata

Beyond the pixels, there is information that describes how the image can be used.

Image dimensions

Images typically have a width and height. This can be expressed in various ways:

  • 4 values: minX, maxX, minY, maxY
  • 2 values: width, height
  • 1 value: aspect ratio

Giving just the aspect ratio means that images don't have an intrinsic size, which may or may not be a good thing: intrinsic sizes in images are what cause images to "pop in" in incremental environments like the web. Without intrinsic sizes, but only an intrinsic ratio, one dimension would need to be provided. With neither, both dimensions would need to be provided (but then the aspect ratio might be lost, which leads to poor rendering quality).

Baseline

Images are often embedded within text, in which case aligning the image in an aesthetically pleasing manner is non-trivial. For example, the 🐑 symbol on this line is not aligned with the bottom of the line and not aligned with the baseline of the line, it's aligned so that it sits pleasantly relative to the baseline. To achieve this effect, the image needs to have an intrinsic baseline position. (In the case of the 🐑 symbol, the picture is actually a character in a font, so baseline information comes from the font.)

Text 🚫

A common feature of graphics is embedded text. We could require that all text be vectorized before being embedded. This would guarantee that the results are the same on all platforms, and would side-step the need to deal with fonts, which are a serious source of difficulty with vector graphics (generally one does not want to embed every font in every vector image, but if the font is not embedded there is the risk that it isn't available at render time).

One reason to support text as a primitive shape, though, is that it would allow for reflowing of text when rendering at different sizes (for example to honor font size scaling done for accessibility), without the image having to contain all the precomputed shapes. It would also allow for text to be "late-bound", provided as a parameter during rendering (see also the next few sections).

Looking at our use cases and the priorities listed above, the arguments pro and con are somewhat limited. Assuming fonts are carried out of band anyway to show text in the app, an argument could be made that including the text verbatim would result in much smaller images than if every glyph had to be expressed as a path. On the other hand, if the font must be embedded then the footprint argument swings the other way.

One could also argue that it would make hand-editing easier if text was supported (so that people didn't have to find a way to vectorize text when hand-editing files).

The main argument against is the cost of implementation. Text is not a trivial problem[32], and so many features would need to be added to support it; for example, to style the text or spans of the text, as well as those features listed in the following subsections. Even the simplest of text features presents a very large implementation burden and the potential scope is even larger. Text rendering involves executing code (fonts are turing complete), which has security implications. Text is out of reach of trivial implementations, which could limit the potential reach of the format. Text may also be too complicated to reasonably implement purely in GPUs (e.g. for a shader-based implementation filling paths is fine but implementing hyphenation, line wrapping, the bidi algorithm, shaping, etc, may be a step too far).

Text styling features

There are a number of features that could be considered when implementing text in a vector graphics format:

  • Alignment to a side or to a center.
  • Fonts and font selection; embedded fonts, system fonts, referencing fonts in remote resources (e.g. over HTTP by URL).
  • Font variants, font features.
  • Font size.
  • Paint style for text (e.g. color, gradients, blend modes...).
  • For consistency, the same styles as can be applied to any shape should apply to text.

In addition there are more complicated features, listed in the subsequent sections.

Span styling

A line of text could be styled uniformly, or support could exist for styling subspans of the text with different styles. This introduces new difficulties such as baseline alignment and text decoration spanning (e.g. do underlines span across subspans or can they be turned off). Spanning itself can be described either as overlapping regions or as a tree structure.

Font fallback

A subset of span styling is support for font fallback, where a glyph that is absent in one font is obtained from another font during rendering. For example, text might use a basic Latin1-only font but include Emojis and Fraktur mathematical symbols that are obtained from two other dedicated fonts. Support for this is effectively a form of implicit span styling and suffers from many of the same complications.

Paragraph layout

Supporting text could mean supporting a single line of text, or supporting flowing text into multiple lines. In the latter case there are a number of potential complications:

  • Defining line breaking opportunities.
  • Hyphenation.
  • Justification.
  • Line spacing (half-leading, struts, etc).
  • Irregular wrapping shapes (flowing around an image).

Internationalization

If we support text, we must support Unicode, bidirectional text, labeling text as LTR vs RTL, aligning to "start" and "end", providing the text's locale for font selection, and so forth. Flutter already bears the cost of supporting this, so the impact on the implementation in Flutter would be small. The cost in the format itself should also be relatively low. Unicode is the standard way of encoding text and is quite efficient; it also supports expressing bidirectional text formatting. Labeling the overall direction takes one bit per text shape, the text alignment a few more bits, and so forth.

Vertical text

Flutter explicitly does not support vertical text. We could support vertical text in this format, since many of the constraints don't apply if there's no layout mechanism (presuming for instance that we decide to lay shapes absolutely rather than computing their layout at runtime). This could also be a direction to expand in later, should there be demand.

In general, even if we support text, we should probably initially not support vertical text, so as to minimize the overall scope of the initial effort. The priorities described above argue for this too (minimizing the disk footprint of the renderer).

Text along a path or along a mesh

A pair of common effects in formats that have a text primitive is the ability to draw the text along an irregular baseline (placing glyphs tangential to a path) or warped to a mesh.

Localization

We could allow images to include tables of strings, and then have the strings be looked up based on a locale parameter (see below).

With this feature, individual images may be bigger (containing text for every supported locale). Without this feature, we would either need multiple images (one per locale), or need to make text parameterizable.

Looking at the priority list described earlier, there is a push towards not supporting this feature in the format itself but instead putting the burden on the application that uses the image (Flutter suitability arguing to just rely on Flutter's existing mechanisms, and disk footprint of the image arguing against tables of strings).

Combining localization and parameterization features (for example allowing numbers to be inserted into text) would dramatically increase the cost of localization, since it would require supporting numeric, time, and date formats, pluralization, and other localization features which are significantly more work than merely picking a string from a table. Similarly, localization combined with paragraph-wrapping and hyphenation suddenly extends the scope of both localization and wrapping to include locale-specific hyphenation dictionaries.

Parameters ✅

Values within the image, such as coordinates for geometry, colors, the size of text (or even maybe the contents of text), could be driven from input from outside the image.

Some of the parameters described below would be very useful for addressing some of the priorities and use cases listed above (e.g. improving the rendering speed at small sizes, or animations). Once parameters are supported in any form, supporting them in general need only be a minimal additional cost. The precise extent to which they should be supported can be decided based on the constraints of the details of the format when it is designed.

Constants

Some values are known when the image is generated (for example, the image's intrinsic dimensions). These do not need to be exposed to the image, since they can be hard-coded into the file by the generator.

Predefined parameters 🚫

Certain values that could impact the rendering and that are not known at the time the image is generated include:

  • The total width and height of the render surface in physical pixels (possibly an approximation[33]).
  • The time according to the system clock.
  • The user's preferred locale(s).
  • The user's preferred font size scaling factor.
  • The ambient text directionality (RTL vs LTR).

Exposing these to all files may have downsides, however. For example, testing is harder if the file can determine the time independent of the test. If, instead, all parameters must be explicitly passed in, then a test would have full control over the output. Another example would be potential privacy implications: a website that shows user-provided images would unwittingly allow a user to upload a file that always matched the other users' locales (e.g. imagine a file that shows a different flag based on the locale), which could be used in malicious ways.

Animation

[a][b]

For animated icons that transition from one state to another, or that react to a state, a parameter could be provided from the host that drives a clock from 0.0 to 1.0. This would fit in well with the animation APIs in Flutter already.

Multi-stage animations 🚫

A feature that isn't handled by merely having one parameter for the animation clock is an animation that transitions between states that each themselves loop. For example, consider a vector graphic that describes a train spinning on a loop of track with a switch that leads to a second loop. One clock is required to describe the looping train, and if the train is also to be able to switch to the other loop, a second parameter is needed to indicate which loop the train should be on. However, merely those two parameters are insufficient to have the train remain on the first loop until it reaches the switch.

To achieve this kind of animation control, either the application would need to manage multiple parameters, or the animation would need built-in logic to make decisions about its animations, or the format would need some mechanism to support such animations. The solutions are not fantastically attractive. The first requires artists to get engineers to write bespoke code for their animations. The second would lead to Turing-completeness, which is discussed elsewhere in this document. The third option would open the format to a potentially unlimited set of features to handle compound animations.

Versatility of rendering at different sizes

Being able to turn on or turn off certain shapes based on the zoom level would be very useful for several of the use cases, most obviously icons, which are often shown at wildly varying sizes (e.g. on macOS icons are rendered at sizes from 1024x1024 to 16x16 depending on the UI mode).

In practice, it's more than just "on" vs "off". When rescaling an icon, for instance, features in the image that are turned off at one level should probably fade out rather than simply snapping out of existence. This suggests that the detail level should be a parameter, possibly corresponding to some approximation of the number of physical pixels per coordinate system pixel[34], that can be used to drive the level-of-detail feature. 🧚

There is a lot of prior art in this area, especially relating to fonts, which try to optimize shapes for different sizes to maintain consistent stroke widths, improve contrast, and maximize legibility ("optical sizing"). In some cases, entirely different glyphs are used at different sizes.

Types of parameters

There's a variety of types of data in a vector graphic. Most of them are numeric, or can be trivially interpreted as numeric values; some others have more elaborate representations.

  • doubles, e.g. for coordinates, for sizes, stroke widths
  • colors (8 bits per channel, more than 8 bits per channel)
  • strings, e.g. for text being rendered in the image 🚫
  • booleans
  • various enums, or custom enums (or integers)
  • transformation matrices
  • points (offsets), sizes (i.e. pairs of doubles)
  • rectangles (four doubles)
  • paths and components of paths 🚫
  • paints 🚫
  • bitmap images 🚫
  • locales 🚫

One can imagine allowing any of these to be used as parameters.

Expressions ✅

If one exposes parameters, as discussed above, then one quickly finds the need to derive values from those parameters. For example, darkening a color, so that an icon can be colorized with a single parameter, but still retain multiple shades. For some computations, workarounds could be found, e.g. changing colors by blending the given color with transparent black or white. However, expressiveness is increased if we allow for arbitrary expressions.

To make this useful at all, some built-in operators and functions are necessary. Different types have different needs. There is a question about how much to allow types to be converted between each other; for example, should it be possible to take four doubles and create a color? Should it be possible to cast a double to an integer, and an integer to an enum? Answering these questions will likely require a study of the use cases and of available features in hardware (see the GPU section below). 🧚 The sections below cover some of the possibilities for the various types discussed above.

There is a tradeoff to be made between expressiveness and complexity (and thus footprint) of the implementations. Assuming a forward-compatible strategy is used, it is likely best to start with a minimal set of features here and then extend them in response to market needs.

Arithmetic

Arithmetic would allow parameters to be used for controlling the positions and other details of shapes. For example, having one shape move at twice the speed of another in an animation, or having three shapes staggered one after the other in an animation.

Precisely how much to build in is unclear. Presumably the basics, addition and subtraction, multiplication and division, are uncontroversial. Beyond this, however, a wide variety of operators and functions could be provided, for example:

  • exponention (powers, roots).
  • logarithms, e.g. log2, loge.
  • trigonometry, e.g. sin, cos, tan.
  • rounding, e.g. round, ceil, floor.

Some types are numeric in nature, but may need more operators. For example, colors may need bitwise operators. We could also expose arithmetic on individual components of a color rather than the whole, or conversion between RGB and HSL/HSV color spaces, or between degrees, turns, and radians.

Conditional

We could include a manner in which to optionally include a shape. We could also include a mechanism to select between two values in an expression (as in the "?:" operator). In either case, we would be dealing with booleans, and usually this implies needing a way to compare values to each other.

For numeric types, the basic operators (<, >, <=, >=, ==, !=) seem uncontroversial. For other types, equality seems obviously valuable. However, it is easy to see more elaborate options, e.g. point-in-path, pattern matching for strings, or measuring the "darkness" of a color (e.g. so that an image can automatically adjust its colors to remain high-contrast regardless of parameter values).

Properties of values

Some values are multidimensional and it makes sense to inspect different aspects of them. For example, the red, green, and blue components of a color, or the length of a string, a particular property of a paint, or the bounding box of a path.

Interpolation

A common pattern in Flutter code around animations is the "lerp" method (short for "linear interpolation"). Many types have defined interpolations. One could provide such a mechanism in expressions in this format, e.g. to allow easily computing the color during a fade between two colors given in parameters.

One could go even further and define interpolation between shapes (especially between paths).

Shapes

A path primitive is fundamental to a vector format, as it allows for the ultimate flexibility in drawing vector images. There is the issue of what path primitives to expose, and whether to support relative and absolute coordinates. These issues depend on implementation details that will be discussed below.

There are other possible primitives. Text has been mentioned already. One could imagine providing primitives similar to Canvas.drawAtlas and Canvas.drawPoints. Which primitives should be included in the format depends on implementation details.

Types of paths

Shape paths can be described in a number of ways. It is common in formats like SVG to provide an expressive vocabulary with arcs, straight lines, Bézier curves of various orders, etc, potentially with different variants such as absolute coordinates, relative coordinates, and chaining curves (e.g. SVG's "T" command).

One could imagine a format that only supports a single command, also, if that path type is sufficiently expressive (e.g. rational cubic Bézier curves).

The primary factors to consider for each particular type of curve and variants of tha type are:

  • How much data does this particular type of curve need?
  • How expensive is it to implement?
  • How expressive is it?
  • How redundant is it with other types of curves?

In addition, one must consider the cumulative cost of each supported type.

For example, straight, axis-aligned lines require very little data, and are cheap to implement. On the other hand, they are entirely redundant with arbitrary straight lines (those not necessarily axis-aligned), as well as with Bézier curves (a Bézier curve can describe any straight line). So when deciding whether to support a dedicated axis-aligned line and arbitrary straight line features, one must compare the complexity of a format with N curve types, and one with N+1 curve types, and one with N+2 curve types.

In general, the following feedback is pretty compelling:

  • Cubic Bézier curves are able to express most shapes, including straight lines, but they cannot strictly express true circular arcs.
  • There is a desire to be able to express true circular arcs and straight lines.
  • Rational cubic Bézier curves (which could handle circles) are expensive to compute.
  • Rational quadratic Bézier curves (which could handle circles) are less expensive.
  • Cubic Bézier curves are less expensive.
  • Minimizing the number of kinds of curves supported is desirable to minimize implementation complexity.

From these points, it follows that one could consider a format with only two kinds of curves (rational quadratic Bézier curves and cubic Bézier curves). The main downside is that images with lots of straight lines would be bigger and slower to render than if the format was optimized for straight lines. 🧚

Similarly when considering relative coordinates vs absolute coordinates vs supporting both, one must consider the redundancy of having both (with the commensurate implementation cost) as well as the convenience of having both (e.g. making paths easier to handcraft). In practice, since hand-authoring this format is not a priority, there seems little need for supporting redundant features like relative coordinates. 🧚

Fills ✅

Filling the inside of a path is probably the most basic feature of a vector format. How to describe the path is an issue that will be discussed in more detail below, but it is worth noting in passing that a fill can be described either by a sequential set of steps in a path, or an unordered set of path segments that, together, describe an outline. This latter approach can allow for more parallelism in the implementation.

Strokes 🚫

Stroking a path is a common feature in vector formats but for static images it does not add more expressiveness as any stroke can be converted to a fill of a more elaborate shape[35]. For example, the stroke of a circle is the fill of two nested circles with the right winding rule.

Strokes do require more implementation complexity than fills, however. Corners can have different joins, and some joins may need miter limits specified; the ends of paths might need special caps; the precise order of points in the path is important. All these additional complexities make it tempting to push the problem of strokes to the encoder, thus simplifying the format and its implementations.

Animations

One use case that would suffer if strokes had to be pre-converted to fills by the encoder is animated strokes, especially animated dashed strokes (e.g. a crawling ants effect). In some cases, animating a fill by rotating a gradient's transform could achieve a similar effect but this is not a general solution, especially for curved strokes.

Hairline strokes

A feature that cannot be implemented using fills alone is hairline strokes (a stroke that is exactly one device pixel wide, regardless of the image size).

Pixel snapping, shape alignment 🚫

Shapes could be annotated to indicate that their coordinates (especially start and end coordinates) should be snapped to device pixel boundaries.

Shape coordinates could be defined relative to each other, e.g. starting a line at an offset to another shape's coordinates, maybe with the offset being influenced by the pixel density.

These features could interact with level-of-detail features, or parameters that expose the device pixel ratio, the physical image size, the absolute device image alignment offset, etc.

Paints

Shapes can be styled in various ways: solid colors, gradients, textures that are repeated with particular transforms and generated from bitmaps or nested vector graphics, programmatically generated patterns, blend modes, filters...

The precise set that a format should support is probably best determined by considering the use cases, priorities, and implementation needs. For example, programmatically generated patterns would involve embedding a programming language in the format; this is something we will probably want to avoid (see "Turing completeness" below). Solid colors are very common in icons, this is certainly something we will want to include.

Colors ✅

The most obvious stylistic option is flat color. There are questions that would have to be answered even here: is alpha supported, is it premultiplied, what is the color space, etc.

Color spaces

While a first version of a format might be able to get away with only supporting sRGB, subsequent versions of the format will surely find the need to support more elaborate color spaces to take advantage of the greater expressivity of newer display hardware.

High dynamic range

In addition to supporting broader color spaces than sRGB, a format may need to support a greater color depth than 8 bits per channel: 48 bit color (16 bits per channel) is becoming more widely available today and will surely become commonplace in the decades hence.

There is a question of the cost of reserving 8 bytes each time a color is expressed. A vector format will rarely contain millions of uniquely specified colors (gradients result in many colors in the output but a much smaller number appear in the input). This leaves open the possibility of having a palette that describes 64 bit colors but using 32 bit, or even 16 bit, indices into that palette in the file itself.

Gradients ✅

In addition to a single flat color, one could provide the option of styling with more colors, in the form of a gradient. There are many questions one would have to answer here: how many colors, what stop points are they at, what is the interpolation function, are the gradients linear, radial, swept, or of some other shape; can the gradients be transformed, what tiling mode do they use...

Textures 🚫

It is common for vector graphics to embed bitmap images.

This feature has several tradeoffs. Supporting this feature requires supporting a form of image reference, either inlining a bitmap format, supporting arbitrary attachments, or allowing external references (e.g. URLs). For example, SVG supports the latter[36], while PDFs support inline bitmap images.

These features all come with complexity. For example, inlining another file requires defining an envelope format, and augments the conformance requirements of the format to include the entirety of the conformance space of the adopted bitmap format. Testing implementations for conformance expands to include the entirety of the conformance testing of the adopted format. Implementations can end up supporting different kinds of bitmap formats, which fragments the format's ecosystem.

If a format can be designed without support for these in the initial version without preventing future support for such features, this would allow for the basic features to the format to be solidified before having to attend to these additional complexities.

Shaders 🚫

Beyond gradients and textures, there is no limit to what could be provided. A static set of predefined shaders (e.g. color filters, blurs) or even an open-ended space (e.g. inline SPIR-V code). These could be provided inline in the file, or could be configurable at runtime.

Transforms at the shape level ✅

For some shapes (e.g. paths), transforms (and clips) can be baked in. However, it would allow for simpler animations if transforms could be separately encoded and manipulated at the path level rather than requiring each point to be manipulated during an animation.

Group transforms, group opacity, effect layers, and clips 🚫

Applying effects at the shape level allows for many images to be expressed. However, it may be simpler to reason about the image if groups of paths could be collected and treated as a unit, which could then itself be transformed, clipped, or otherwise painted (e.g. blended). This also allows for certain effects that are not otherwise possible to express, for example, applying a shadow to a group of differently-painted shapes.

There are trade-offs involved in offering group effects, notably around performance, since each group typically requires separate rasterization, and render target switching is expensive.

Injected widgets 🚫

If the vector graphic format is integrated tightly with Flutter's rendering pipeline, it becomes possible (possibly even easy) to support injecting content from outside the vector graphic into the image. By specifying a placeholder rectangle in the image, the renderer can be told to pause rendering the image, call back out to Flutter's framework and ask for a Picture to be rendered at the given location, with the given size. This is similar to how widgets can be embedded inline into text rendering in Flutter.

Painting a widget can involve pushing layers. For example, a TextureLayer for hardware-accelerated video or for a Web view. To support these along with special blend modes in the vector graphic would require deeply integrating the graphics rendering with Flutter's rendering pipeline. For embedding widgets with inline text we avoid this complexity by only allowing widgets to be layered atop the text.

Back references ✅

For images that contain many copies of the same shapes, paints, or other effects, it may be useful to offer a way to define such objects and allow them to be referenced later. For example, defining a particular path that is then reused for a clip mask in one location and a stroke in another, or defining a gradient paint that is then reused in multiple shapes.

This would especially help with the disk and memory footprints of the image.

Hit testing

For some use cases, the ability to hit-test the image would be useful. It would be relatively straight-forward to provide a kind of shape that does not render but that has an identifier; the renderer could then report all such shapes (or the topmost such shape) that intersect(s) a given point. This would integrate well with Flutter's framework.

Turing completeness

Some formats, notably PDF/EPS/DPS (via PostScript) and SVG (via JavaScript), are literally Turing-complete, in that they can run programs (or in the case of PostScript, are programs) to compute the graphical output.

There are advantages to such an approach, in particular, expressiveness. There are also disadvantages, prime among which is that it makes a comprehensive static analysis of the image essentially impossible (due to the halting problem). For some purposes, e.g. determining ahead of time what parts of the image will be rendered based on the given inputs (see the "culling" section below), static analysis is very important.

For this reason, it is probably valuable if this format eschews Turing completeness. 🧚 Depending on what features we introduce (especially around expressions), this may be tricky, and care will need to be taken to keep from accidentally falling into this trap[37].

Clipping at the image edge

It's usually assumed that images will be clipped at their edge, so shapes that extend beyond the edge of the image are not drawn outside the image bounds. In principle one could require that encoders never go outside the bounds, but this opens the door to some interesting security issues, e.g. images on web sites that render over adjacent content from other security domains (origins).

In general, high-quality clips are expensive. A compromise requiring low-quality (non-antialiased) clips at the image edge may be sufficient to address the security needs without a performance hit.

Design ideas

This section lists some design ideas that may or may not make it into the final proposal.

Culling

Shapes in the vector image could be stored so that those that are needed to render a subscene can be quickly found.

There are multiple dimensions that are relevant:

  • two or more dimensions to describe the region being drawn and the region covered by the shape (in the simplest case, the region and shape can be described as axis-aligned bounding rectangles, which only requires two dimensions).
  • for variable detail, one dimension for the current level of detail to show (see discussion above).
  • for images that depend on parameters that may themselves vary, e.g. a clock parameter, one dimension per parameter.

If the detail level and other parameters are treated uniformly, then this simplifies to two dimensions for the bounding box, plus one dimension per parameter. However, if these parameters are themselves capable of being used to affect the geometry, the bounding box would have to be the bounding box over every possible combination of values for the parameters, which may be prohibitively expensive to compute.

Several data structures are candidates for this culling mechanism, including multidimensional interval trees, and multidimensional R-trees. For geometry-based filtering in particular, the scene could be stored in a data structure similar to a quad tree.

The tradeoffs involved here:

  • This would likely improve performance for complex scenes that use parameters or that are rendered cropped. Without a culling algorithm, the entire scene has to be processed in every frame, regardless of how much of the scene is to be rendered.
  • Implementation complexity of the renderer is increased somewhat.
  • Implementation complexity of the encoder is increased, the magnitude of the complexity depends on the design. (Generally, creating balanced trees is more complicated than querying pre-balanced trees, so the cost on the generation side is likely higher than the code on the renderer side.)
  • Disk footprint of the image is increased since it has to hold these tables.
  • Depending on the design, there is the potential for redundancy in the format (e.g. if the shape is of a size that exceeds the size implied by its position in the quad tree). This would be a source of bugs.
  • Memory footprint for rendering is increased, as these tables must be kept in memory. This could be somewhat mitigated by careful design of the on-disk format so that it can be efficiently processed in its raw form.

In practice, the burden for implementation here is primarily on the generator, in determining what shapes are visible for what combinations of parameters. That said, the degenerate case where the generator assumes every shape is always available would function correctly, it would just be less optimal.

Conclusion: This feature should be included in the format if a data structure can be found that has a reasonable level of implementation complexity. 🧚

Designing for GPUs[38]

One question to be asked is how much can the language be optimized for implementation using GPUs (e.g. using shaders)? Maximizing the level to which specialized hardware is used to render the vector graphics, moving as much work as possible from the CPU to other hardware like the GPU, is in line with our desire to prioritize rendering speed.

Traditional GPUs

In practice, to truly optimize a format for the traditional GPU hardware, one would need to consider a very basic format, primarily focused around drawing triangles (something equivalent to different calls to "Canvas.drawVertices" in the Flutter API). As soon as the format is in any way more complicated than that, the implementation on traditional GPUs becomes non-trivial and there is little sense in trying to optimize for the ability to implement it efficiently in hardware.

Animations

At a higher level, there are format decisions that can be made that can dramatically help with performance. The main one is being able to report ahead of time if a particular shape or set of shapes will be changing, vs whether it will remain static. If a shape remains static, a higher up-front cost to "compile" it into a form that is more efficient to paint will be worth paying, as the initial cost is amortized over subsequent frames (to put it another way, "GPUs like things that never change"). On the other hand, if a shape will change dynamically every frame, it is more efficient to use more expensive paint operations to paint the shape, but avoid the much more expensive cost to set up the drawing in the first place.

For example, consider drawing a circle[39]. There are two approaches one could take on a traditional GPU. The first is to create a shader that solves the equation of the ellipse for each pixel, to determine if the pixel should be rendered as opaque or transparent. This approach is expensive on a per-pixel basis, and has a constant cost; subsequent frames will cost the same to render as the first frame. The price is almost entirely borne by the GPU. The second approach is to convert the circle into a batch of triangles. This has a high upfront cost (borne by the CPU), but actually painting the circle is absurdly fast as it leverages the GPU's innate affinity to drawing triangles. If the circle is drawn twice, the second time will be very quick, much quicker than the approach with the shader and the equation of the ellipse. On the other hand, if the circle changes radius every frame, then the triangle approach would be much more expensive as it would need to be converted to triangles afresh every frame.

An approach that can help with performance for animations in particular is a mutable scene graph, where animations result in updates to the scene graph rather than an entirely new display list each frame. The update approach allows a rendering engine to re-use substantial portions of the computation from previous frames. For this reason, we should ensure that any format we develop is designed to support being implemented as a scene graph.

General-purpose GPUs

For more modern GPUs (General Purpose GPUs, supporting Vulkan), an approach based on rendering paths in bulk with unordered segments is dramatically more efficient[40]. For such an approach to be maximally effective, path data must be provided in a form that can be consumed by a shader efficiently, rather than being expressed as an imperative set of operations. For example, rather than passing the set of commands "moveTo x0,y0, lineTo x1,y1, lineTo x2,y2, lineTo x3,y3, close", one might pass an array of path segments of type "line" consisting of "x0,y0,x1,y1;x0,y0,x3,y3;x1,y1,x2,y2;x2,y2,x3,y3". In such an approach, strokes are not a supported primitive; instead, strokes would be pre-converted to fills. Taking this further, one can imagine bulk-uploading not just paths, but also transforms, style information, and blend information (each being a separate step in a parallel pipeline for rendering the paths).

Future GPUs

Looking forward to hardware that could be expected in coming decades, the highest importance is to enable parallelism, and thus avoid features that are inherently ordered in their processing. For example, a fill style that was defined as an iterative function where a user-provided expression was computed for each pixel based on the result of that expression applied to the previous pixel would be a worst-case scenario: only one pixel can be computed at a time. A fill style that was defined as a user-provided expression whose parameters are only the coordinates of the pixel would, on the other hand, allow every pixel to be computed in parallel.

Discussion

Another minor factor is how much effort is needed to convert the data in the file into a form usable by the rendering logic. A format that must be converted into data by mapping commands in the file to an imperative immediate-mode drawing API may not achieve as fast a rendering performance as one where the format can be mapped directly into data structures that can be used to drive the rendering. One way to achieve this would be to provide path descriptions in one part of the file, paint (styling) descriptions in another part of the file, and so forth, with those sections quickly parsed (or even directly mapped) into corresponding data structures.

Another approach is to avoid features that indirectly cause changes to the scene graph where significant work must be done to determine what will change. (An example of such a feature is SVG's use of CSS rules and inheritance, where many changes (e.g. to one element's attributes, or even mouse movements via ":hover" rules) can have knock-on effects on other parts of the scene graph -- computing what is affected by any particular change, and computing what might be affected by any hypothetical future change, is non-trivial.)

There are specific features that can be expensive, especially on traditional GPUs. Blurs and some other image filter effects are one obvious example (blurs in particular are expensive because they require multiple passes and are not a good fit for implementation on the GPU). Compositing layers is another (equivalent to "saveLayer" in the Flutter API).

In general, any time the GPU has to switch configurations there is a cost; the render target switch of compositing layers is merely the most expensive example. Another would be alternating between drawing rectangles and drawing text. This particular cost can be avoided in many cases by reordering draw operations so that similar operations are done together (drawing multiple rectangles then drawing multiple segments of text).

Ideas from other formats

Avoiding antialiasing seams 🚫

Consider two adjacent rectangles of one color, composited over another solid color. At the point where the two shapes touch, there is typically a seam because the colors of the rectangles are anti-aliased with the background.

Paths in Adobe Flash could be given a separate "left side color" and "right side color", so that two adjacent shapes being composited over a shape of another color could be antialiased without leaving this seam[41]. Essentially, the "outside color" would override the background color when computing the antialiasing of the shape's edge.

Multiple formats

It is possible that rather than solving all the use cases listed above with one format, we should consider different formats. For example, one for static images and one for animated images; or one for small images (icons, skeuomorphic widgets) and one for large images (backdrops, skeuomorphic screens); or one for simple graphics and one for graphics with parameters and hit testing.

The tradeoff here is on the implementation complexity front and on the matter of how easy it would be for us to convince people to adopt multiple formats rather than one.

Using fixed-point coordinates 🚫

Most graphics formats use floating-point coordinates. There are benefits to this; for example, it allows for arbitrary levels of detail; a map could be expressed using a coordinate system in kilometers and yet still allow the user to zoom into the image and show a virus in detail at the true scale, while simultaneously allowing the user to zoom out of the image and show the entire solar system in the same image also at true scale.

However, our use cases explicitly exclude maps as a use case, and the use cases that are listed really do not need this level of expressivity.

Floating point numbers have some issues.

Most ARM GPUs today don't support 64 bit floats, so we would have to consider using 32 bit floats or requiring some preprocessing for today's slowest hardware. (In contrast, 64 bit integers, even where not supported, could be implemented relatively easily in software using instructions such as ADDC.)

Errors tend to creep into arithmetic involving floating point numbers in unintuitive ways. If we take an approach where fill paths are expressed using disjoint path segments, then code that attempts to correlate points may find that the encoders failed to compute the coordinates consistently and that the path does not in fact exactly line up in the least-significant-bits. This problem is so pervasive in Flutter's layout code that Flutter allows floating point numbers to be considered equal even if they are only mostly equal. (In contrast, integers do not have this issue.)

This all suggests considering using integers, potentially combined with some defined or dynamic scale factor, as the basic data type for expressing coordinates. On the other hand, graphics are commonly done using floating point, and forcing all computation to be done in the integer domain may be sufficiently non-idiomatic to be worth avoiding.

Design decision summary

  • Format will be optimized for machine readability, not hand-authoring.
  • A human-optimized format will exist that can be compiled to the machine format.
  • A tool will be created to convert from human format to the machine format.
  • The format will not be optimized for editors.
  • The format will not be Turing complete[42].
  • Design will prioritize concerns as follows:
  1. Security (most important).
  2. Backwards compatibility.
  3. Rendering speed of the full image (and power requirements).
  4. Flutter suitability.
  5. Forwards compatibility.
  6. Rendering speed of the image when rendered at small sizes.
  7. Rendering quality.
  8. Rendering speed of subparts of the image.
  9. Disk footprint of the renderer.
  10. Disk footprint of the image.
  11. Memory footprint.
  12. Ability to create a corresponding hand-editable format.
  13. Accessibility and indexability.
  • Features will include only:
  • parameters (allowing runtime manipulation of numbers, animation)
  • only explicitly defined parameters, no implicit parameters
  • only parameters expressible as numeric values
  • usable as replacements for colors/gradients, coordinates
  • expressions to manipulate parameters
  • operators limited to what is commonly available in GPUs
  • filling shapes (not strokes, not text, not textures) with specific styles
  • paths described using only cubic Béziers.
  • styles described using only:
  • flat straight colors
  • gradients
  • hit testing
  • parameter-driven image composition (e.g. including or excluding shapes based on level of detail, timeline)
  • The format will be designed so that primitives (e.g. shapes) can be referenced multiple times.

Evaluating existing designs

The focus on rendering speed as a very high priority pulls away from formats such as SVG and VectorDrawable, which require, at a minimum, an XML parser, and to a lesser degree formats like Lottie, which require a JSON parser. For optimal performance one is pushed towards binary formats and, ideally, formats that can be interpreted and rendered natively in GPU hardware (e.g. using compute shaders). Parsing text formats does not lend itself to this implementation strategy.

Evaluating new designs

Compatibility with authoring tools

As part of reviewing new designs (such as those below), we should consider how well the proposed formats fit in with existing tools. For example, verifying that gradients are defined in a manner compatible with the conventions used in Adobe Illustrator or SVG.

Strawman designs

If you have any proposals, please do not hesitate to describe them here. Proposals should have sample implementations and sample images (ideally derived from the sample images of existing formats, so that they can be compared more easily), as well as documentation describing the format.

Icon VG

Developed in the google/iconvg GitHub repository, IconVG is a vector graphics format whose design constraints differ from those described in this file, but which addresses a similar set of needs.

Primary designer: Nigel Tao

Dart implementation: https://github.com/google/iconvg/tree/main/src/dart

Test images: https://github.com/google/iconvg/tree/main/test/data

Open Font Format

One hypothetical proposal could be to extract the vector graphics parts of the OFF into an independent format.

A fixed-alignment binary format

[c][d]

This section describes a file format known here as Web Vector Graphics, or WVG.

This is presented as a proof of concept, not a formal proposal. It is intended to encourage a review of the priorities presented earlier in this document, to verify that the specified features are indeed a suitable set of features and that none of the omitted features are important enough to warrant reconsideration.

Primary designer: Ian Hickson

Dart implementation: https://github.com/google/ui-exp-dg/blob/master/wvg/rendering/lib/wvg.dart

Test images: https://github.com/google/ui-exp-dg/tree/master/wvg/handcrafting/samples

Introduction[e]

This section is non-normative.

The WVG format is a binary vector graphics format.

While it is intentionally quite extensible and therefore could host many more features in the future, currently this format supports only painting stacks of paths, each one painted by filling it by either a solid color, a linear gradient, or a radial gradient.

A path is described as one or more shapes, shapes consist of one or more curves, curves are either[f][g][h][i][j] cubic Béziers and rational quadratic Béziers.

The format consists of blocks of 64 words, and every word is 32 bits. There are various block[k][l][m][n] types, such as matrix blocks, curve blocks, or gradient blocks. Data in these blocks is aligned in a regular fashion; for example, a matrix block consists of four sets of 16 words giving the 16 values of a 4x4 matrix. Blocks have no framing.

Here is a sample file:[o][p][q][r][s][t]

    0: 0a475657 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000

       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000

       00000006 00000000 00000000 00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000

    1: 42400000 42400000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

    2: 000000ff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

    3: 3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 41c00000 40800000 00000000 3f800000

       3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 41d00000 42080000 00000000 3f800000

       3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 41d00000 41900000 00000000 3f800000

       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

    4: c1a00000 00000000 41a00000 00000000 c0800000 c0800000 00000000 00000000 c0800000 c0800000 00000000 00000000 ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

    5: 41a00000 42200000 41a00000 00000000 00000000 c1400000 c1400000 00000000 00000000 c0800000 c0800000 00000000 ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

    6: c130cccd c1a00000 4130cccd 41a00000 00000000 c0800000 c0800000 00000000 00000000 c0800000 c0800000 00000000 ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

    7: 00000000 41f86666 42200000 410f3333 00000000 00000000 c1400000 c1400000 00000000 00000000 c0800000 c0800000 ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

    8: c1a00000 c130cccd 41a00000 4130cccd c0800000 c0800000 00000000 00000000 c0800000 c0800000 00000000 00000000 ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

    9: 410f3333 42200000 41f86666 00000000 00000000 c1400000 c1400000 00000000 00000000 c0800000 c0800000 00000000 ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff

   10: 00000000 00000000 00000004 00000006 00000000 00000004 00000004 00000006 00000000 00000008 00000004 00000006 00000000 00000000 00000000 00000000

       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

   11: 00000000 00000000 00000002 ffd00000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

This file has 12 blocks[u][v][w][x][y][z][aa][ab][ac][ad][ae] (numbered along the left margin). The first block is the header, and specifies how many blocks of each type are in the file:

   0: 0a475657 00000001 00000000 00000000 00000000 00000000 00000000...

The first word is the signature. The second word says that there is one block of type 0 (metadata blocks), the next few words are all zero indicating that there's no blocks of type 1, 2, 3, etc.

Examining the first block carefully indicates that there are the following blocks:

  • 1 header block (not indicated in the header itself)
  • 1 block of type 0
  • 1 block of type 7
  • 1 block of type 23
  • 6 blocks of type 31
  • 1 blocks of type 35
  • 1 block of type 55

This adds to a total of 12 blocks, as expected.

Most block types aren't defined in this specification, which is why they are zero; this allows for future expansion in a forward- and backward-compatible manner (renderers ignore unknown block types but can skip them easily).

Each of the block types that are present have a particular meaning. For example, block type 0 is the metadata block. The metadata blocks starts as follows:

   1: 42400000 42400000 00000000 00000000 00000000 00000000 00000000...

The word 0x42400000 is an IEEE754-encoded floating point number (binary32). It represents the number 48.0. The first one is the width of the image, the second is the height. The remaining 62 words of the metadata block are zero; again, future versions of the format may define meaning for those values but for now they are skipped.

As per the header, the next block is of type 7, which corresponds to a parameter block. The parameter blocks define blocks of 64 values that can be configured at "runtime" (when the image is being displayed). The values in the file represent the default values for the parameters. In this file, it turns out that only the first parameter is actually used; the other 63 are ignored. (There is no way to know this directly from examining the parameter block.)

Here is the parameter block:

   2: 000000ff 00000000 00000000 00000000 00000000 00000000 00000000...

The default value of the first parameter in this file is 0x000000FF, which is either the number 255, the color "black", or roughly 3.57e-43, depending on whether it represents an integer, a color, or a floating point number. We will see how the parameter is used later (spoiler: in this file, it's interpreted as a color). A file could have more than one block of parameters; for example, a file with 3 blocks of parameters would have 3*64 = 192 configurable parameters.

The next block is of type 23. (This implies that there are no expression blocks in this file; those are of type 15, and the 17th word in the file, which gives the number of blocks of type 15, is zero.)

That block has a lot more non-zero data than the others:

   3: 3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 00000000
     00000000 00000000
3f800000 00000000 41c00000 40800000 00000000 3f800000
     3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 00000000
     00000000 00000000 3f800000 00000000 41d00000 42080000 00000000 3f800000
     3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 00000000
     00000000 00000000 3f800000 00000000 41d00000 41900000 00000000 3f800000
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

This block is a set of four matrices, in column-major order, with each word representing a binary32 floating-point number. In this case the first matrix[af][ag][ah][ai][aj] is[43]:

1.0

0.0

0.0

24.0

0.0

1.0

0.0

4.0

0.0

0.0

1.0

0.0

0.0

0.0

0.0

1.0

...which is a translation matrix applying an offset of (+24.0, +4.0). The second matrix is almost identical, but applies an offset of (+26.0,+32.0), the third applies an offset of (+26.0,+18.0), and the fourth is all zeroes (and, it will transpire, is not used in this file).

Next we have 6 blocks of type 31, curve blocks:

   4: c1a00000 00000000 41a00000 00000000 c0800000 c0800000 00000000...

   5: 41a00000 42200000 41a00000 00000000 00000000 c1400000 c1400000...

   6: c130cccd c1a00000 4130cccd 41a00000 00000000 c0800000 c0800000...

   7: 00000000 41f86666 42200000 410f3333 00000000 00000000 c1400000...

   8: c1a00000 c130cccd 41a00000 4130cccd c0800000 c0800000 00000000...

   9: 410f3333 42200000 41f86666 00000000 00000000 c1400000 c1400000...

These are read "vertically": each curve has one coordinate in each block, so here we see 7 curves (out of the 64 curves that these 6 blocks represent). The data in these blocks is in binary32 format (floating point numbers). In this case the curves are all cubics, and the coordinates in block 4 are the x3 coordinates, block 5 has the y3 coordinates, block 6 has the x1 coordinates, and so on with y1, x2, and y2. (The x0 and y0 coordinates of each curve are implied by the previous curve; and each set of curves begins at the origin.) Curve blocks come in groups, in this case 6 blocks form the group.

Looking at the blocks carefully will show that many of these words are 0xFFFFFFFF. This is a NaN in the floating point binary32 format. Of the 64 curves, all but 12 are entirely formed of NaNs. These are, unsurprisingly, unused in the file. So really, there are 12 curves in these 6 blocks.

There are only two more blocks in this file. The first of these, block 10, is of type 35, shape blocks:

  10: 00000000 00000000 00000004 00000006 00000000 00000004 00000004...

It indicates how to combine the curves into a shape. In the shape blocks, each block holds up to 16 shapes (four words each). The words are integers. The first two words of the shape identify the first curve of the shape (so for the first shape, 0, 0, the first curve of the shape starts at block 0 of the curves, word 0 of that block). The third word is the number of curves in the shape (for the first shape here, that's 4 curves), and the fourth word is the number of blocks per group for these curves (in this case, 6). The second shape's numbers are 0, 4, 4, 6, indicating that the second shape has four curves, starting at curve 4 in block 0. It turns out there is one more shape, whose numbers are 0, 8, 4, 6. (The rest of the block is all zeroes.) So in total we have three shapes, each formed of four curves, all in the same group described by the 6 blocks of type 31 discussed above.

Finally we have one more block, the composition block, of type 55. Each composition takes an entire block (allowing for significant expansion in the future). Here is the one composition in this file:

  11: 00000000 00000000 00000002 ffd00000 00000000 00000000 00000000...

Compositions specify groups of shapes and matrices to form together into a single path, which is then filled by a specified paint (gradient) or color. Currently each composition consists of five numbers (and 59 zeroes). The first word specifies the index of the matrix that is used for the first shape, the second specifies the index of the first shape itself, the third is the number of extra shapes to add, and the fourth and fifth specify the paint style.

So in this case we specify that the first matrix is matrix 0, the first shape is shape 0, and that there's a total of 3 shapes. (This is why there are three matrices. Each one specifies how to position one of these shapes to form the actual path.)

The fourth word is 0xFFD00000 which is a special value indicating parameter 0 is to be used as a color to paint the path. 0xFFD0 indicates a parameter reference, and 0x0000 indicates the first parameter.

Parameters can actually be referenced in many places, e.g. in curves, using this same form. (0xFFD00000 is a NaN value in binary32, so this does not reduce the expressivity of the format.)

Conventions

All assertions, diagrams, examples, and notes are non-normative, as are all sections explicitly marked non-normative. Everything else is normative.

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119.

These keywords have the same meaning when written in lowercase and cannot appear in non-normative content.

This section describes two conformance classes: WVG files (and by implication, WVG file generators), and WVG renderers. Conformance requirements for each class are entirely independent. WVG file generators must not generate non-conforming WVG files.

In this section, where algorithms are described as sequences of steps, the construction "Assert: Condition" indicates that if this specification has no errors, the condition specified should invariably be true at that point in the algorithm.

Expressions used in the definitions in this section operate in an unlimited domain. (As opposed to, say, the 32 bit domain. For example, addition never overflows.) Implementations are not expected to use these expressions literally, but must implement equivalent logic.

Numbers with a 0x prefix are hexadecimal. Numbers with a 0b prefix are binary. Numbers without a prefix are decimal. For example, 0xC0, 0b11000000, and 192 are equivalent.

Ranges of numbers are inclusive (i.e. the range 0..3 is the four numbers 0, 1, 2, 3).

Structure

WVG files consist of sequences of blocks, each of which is 64 words long. A word is 32 bits long and can be interpreted, depending on context, as any of:

  • uint32: a little-endian unsigned integer.
  • int32: a little-endian two's-complement signed integer.
  • float32: a little-endian IEEE 754 32 bit single floating point binary32 number.
  • color32[ak]: a little-endian unsigned integer representing straight (not premultiplied) RGBA[al][am] quads, with the high 8 bits representing the red channel, the next 8 bits representing the blue channel, the next 8 bits representing the green channel, and the low 8 bits representing the alpha channel.

When this section talks about comparing a word to an integer, it must be interpreted as a uint32. When this section talks about the bits of a word, those bits are interpreted as an unsigned integer of the given size. (For example, "the high 16 bits" of a word implies interpreting the word as a uint32, then shifting that integer right by sixteen bits.) The number zero is represented identically in all three numeric representations and so is sometimes referenced without specifying its interpretation. Zero and fully-transparent black are equivalent.

WVG files must be at least one block long. The first block, the header, consists of a signature word followed by 63 words describing the number of subsequent blocks of each type.

All blocks of a particular type are present in the file contiguously.

Header block

The first word of a WVG file must be the WVG signature. The WVG signature is a uint32 with value 0x0A475657[an][ao]. (Recall that WVG files are little-endian.)

If the file is less than 64 words long, or if the signature word interpreted as uint32 is not 0x0A475657, the remainder of the file must be ignored.

Words 2-64 form an array of 63 uint32s known as BLOCK_SIZES[i] where i has the values 0..62, with the second word in the file being BLOCK_SIZES[0], the third word being BLOCK_SIZES[1], and so forth up to the 64th word being BLOCK_SIZES[62].

The sum of all the numbers in BLOCK_SIZES plus 1 must correspond to the exact size of the file in blocks (a block being 64 words or 256 bytes). If it does not, then the file is invalid and the remainder of the file must be ignored.

For convenience, the following indices in BLOCK_SIZES are named, and the respective entries in BLOCK_SIZES give the number of blocks of each known type in the file:

  • METADATA_BLOCKS = 0[ap][aq] (metadata blocks, with information like width/height)
  • PARAM_BLOCKS = 7 (parameter blocks)
  • EXPR_BLOCKS = 15 (expression blocks)
  • MATRIX_BLOCKS = 23 (matrix blocks, where raw data for matrices is given)
  • CURVE_BLOCKS = 31 (curve blocks, where the raw data for paths is described)
  • SHAPE_BLOCKS = 35 (shape blocks, where curves are collected into paths)
  • GRADIENT_BLOCKS = 43 (gradient blocks, where raw data for gradients is given)
  • PAINT_BLOCKS = 47 (paint blocks, where styles are described)
  • COMP_BLOCKS = 55 (composition blocks)

The value BLOCK_OFFSETS[i] is defined to be 1 plus the sum of all values in BLOCK_SIZES[ar][as] with indices less than i. (BLOCK_OFFSETS[63] is therefore the size of the file in blocks.) It specifies the offset of the first block of a particular type.

The values of BLOCK_SIZES[i] for values of i other than those with named indices above must be zero.

Metadata blocks

If BLOCK_SIZES[METADATA_BLOCKS] is greater than zero, then the first word of the block starting at BLOCK_OFFSETS[METADATA_BLOCKS], as float32[at][au][av][aw][ax][ay], gives IMAGE_WIDTH[az], the width of the image, and the second word of that block, as float32, gives IMAGE_HEIGHT, the height of the image. If BLOCK_SIZES[METADATA_BLOCKS] is zero then the width and height are both 1.0.

The units of IMAGE_WIDTH and IMAGE_HEIGHT are arbitrary but the coordinate space they define is the one used by all other coordinates in the file, modulo transforms.

All other words[ba] in metadata blocks must be zero and must be ignored by renderers.

WVG files should not contain any compositions that would render pixels outside of the rectangle whose top left is at the origin and whose bottom right is at IMAGE_WIDTH,IMAGE_HEIGHT.

Renderers should clip images to the rectangle whose top left is at the origin and whose bottom right is at IMAGE_WIDTH,IMAGE_HEIGHT. Renderers may[bb][bc][bd][be][bf][bg][bh][bi][bj][bk][bl][bm] perform this clip at low quality, because for conforming images this clip will never be necessary. Renderers may skip the clip entirely, especially for content that is known to not extend outside the image rectangle (e.g. because it is not arbitrary user-generated content but is instead content selected by the same team as that invoking the renderer).

Parameter blocks

WVG files can be adjusted at runtime by varying their parameters[bn][bo][bp]. For example, a value in the parameter block could represent the time component of an animation, and paint blocks can refer to parameters when defining flat color paints and colors in gradients.

Parameters are words. A WVG file has zero or more parameter blocks, each of which introduces 64 parameters. For convenience, we define PARAM_COUNT as BLOCK_SIZES[PARAM_BLOCKS] * 64.

If PARAM_COUNT is zero, then the file has no parameters.

Otherwise, the BLOCK_SIZES[PARAM_BLOCKS] blocks starting at BLOCK_OFFSETS[PARAM_BLOCKS] are parameter blocks and represent default parameter data. Each word represents one parameter's default value. Parameters are numbered consecutively and default to the value given in the file, with the first word of the first parameter block being the default value of parameter zero, and the last word of the last parameter block being the default value of the parameter with number PARAM_COUNT - 1.

Later references to parameters with indices PARAM_COUNT or greater are interpreted as references to the number zero.

PARAM_COUNT can in theory be as large as 270, but due to other limitations of this format, only 65536 possible parameters are ever actually accessible (parameters are referenced using 16 bit values). For this reason, files should[bq][br] not specify a BLOCK_SIZES[PARAM_BLOCKS] value greater than 1024, and renderers can treat PARAM_COUNT values greater than 65536 as 65536 without loss of generality (skipping over the "unreachable" parameter blocks if BLOCK_SIZES[PARAM_BLOCKS] is greater than 1024).

At runtime, before or after rendering a file (but not during the rendering of a file), implementations may replace the value of any parameter in the range 0..PARAM_COUNT-1 in an implementation-defined manner (typically, as a result of some API call, as described below). When a parameter is changed, the image should be rerendered at the next available opportunity.

Expression blocks

Parameters are converted to values used in shape and paint definitions using expressions.

Expression blocks are the BLOCK_SIZES[EXPR_BLOCKS] blocks starting with the block at BLOCK_OFFSETS[EXPR_BLOCKS].

Each expression block represents one expression. Expressions are evaluated as per the steps described below. Expressions can refer to earlier expressions (this is necessary in some cases to express complicated expressions since the definition of each expression must fit in 64 words). An evaluated expression has a value which is a word, the interpretation of which is determined when it is used (so e.g. an expression could describe the addition of two int32s, which itself results in an int32, but which is later interpreted as a float32 as part of a coordinate).

For convenience, we define EXPR_COUNT as BLOCK_SIZES[EXPR_BLOCKS].

EXPR_COUNT can in theory be as large as 232, but due to other limitations of this format, only 65536 possible expressions are ever actually accessible (parameters are referenced using 16 bit values). For this reason, files should not specify a BLOCK_SIZES[EXPR_BLOCKS] value greater than 65536, and renderers can treat EXPR_COUNT values greater than 65536 as 65536 without loss of generality (skipping over the "unreachable" expression blocks if BLOCK_SIZES[EXPR_BLOCKS] is greater than 65536).

If EXPR_COUNT is greater than zero, then, for each expression from zero to EXPR_COUNT-1, the value of the expression is computed as follows. These computations must be done sequentially since later expressions may refer to earlier expressions.

  1. Let CURRENT_EXPR be the number of the expression being evaluated, where the first expression has number zero, and the last expression has number EXPR_COUNT-1.
  2. Let EXPR[i] be the ith word of the 64 words in the block of the expression being evaluated, where i is zero for the first word of the block and 63 for the last word of the block. The block in question is the one at block offset BLOCK_OFFSETS[EXPR_BLOCKS] + CURRENT_EXPR.
  3. Let EXPR_INDEX be zero.
  4. Let STACK[i] be storage space for 64 words, where i is zero for the first stored word and 63 for the 64th stored word. The algorithm below is designed to allow, though not require, that STACK and EXPR share the same memory (the observable behaviour is intended to be identical either way).
  5. Let STACK_INDEX be zero.
  6. Let ERROR_COUNT be zero. (ERROR_COUNT is only used to determine the validity of the file, it does not affect renderer semantics.)
  7. Loop:
  1. Assert: STACK_INDEX is less than 64, because each loop iteration increases EXPR_INDEX by one and STACK_INDEX by no more than one, and the loop ends if EXPR_INDEX reaches 64.
  2. If the high bit of EXPR[EXPR_INDEX] is zero:
  1. Let STACK[STACK_INDEX] equal EXPR[EXPR_INDEX]. This allows any positive number (whether int32 or float32) to be encoded verbatim in the expression. Negative numbers can be encoded as their positive value followed by the integer negate or float negate operator (see below).
  2. Increment STACK_INDEX.
  3. Skip to loop increment below.
  1. Otherwise, the high bit of EXPR[EXPR_INDEX] is one. If the two highest bits of EXPR[EXPR_INDEX] are 0b10, this is a one-argument operator[bs][bt]:
  1. If STACK_INDEX is less than one, increment ERROR_COUNT and skip to the loop increment step below. The operator has no effect.
  2. If EXPR[EXPR_INDEX] is 0x80000000 (integer negate):
  1. Let STACK[STACK_INDEX-1] equal the int32 negation of STACK[STACK_INDEX-1] as int32.
  2. Skip to loop increment below.
  1. If EXPR[EXPR_INDEX] is 0x80010000 (float negate):
  1. Let STACK[STACK_INDEX-1] equal the float32 negation of STACK[STACK_INDEX-1] as float32.
  2. Skip to loop increment below.
  1. If EXPR[EXPR_INDEX] is 0x80008000 (integer cast):
  1. Let ARG be STACK[STACK_INDEX-1] as float32.
  2. If ARG is not a finite number, or if ARG is greater than 231-1, if ARG is less than -231, let ARG be zero.
  3. Let STACK[STACK_INDEX-1] equal the int32 nearest integer representation of ARG using odd-even rounding.
  4. Skip to loop increment below.
  1. If EXPR[EXPR_INDEX] is 0x80018000 (float cast):
  1. Let STACK[STACK_INDEX-1] equal the nearest float32 representation of STACK[STACK_INDEX-1] as int32.
  2. Skip to loop increment below.
  1. If EXPR[EXPR_INDEX] is 0x80020000 (duplicate):
  1. Let STACK[STACK_INDEX] equal STACK[STACK_INDEX-1].
  2. Increment STACK_INDEX.
  3. Skip to loop increment below.
  1. Otherwise, the operator has no effect; increment ERROR_COUNT and skip to loop increment below.
  1. If the three highest bits of EXPR[EXPR_INDEX] are 0b110, this is a two-argument operator:
  1. If STACK_INDEX is less than two, increment ERROR_COUNT and skip to the loop increment step below. The operator has no effect.
  2. If EXPR[EXPR_INDEX] is 0xC0000001 (integer add):
  1. Let STACK[STACK_INDEX-2] equal the lower 32 bits of the int64 sum of STACK[STACK_INDEX-2] as int32 and STACK[STACK_INDEX-1] as int32.
  2. Decrement STACK_INDEX.
  3. Skip to loop increment below.
  1. If EXPR[EXPR_INDEX] is 0xC0000002 (integer subtract):
  1. Let STACK[STACK_INDEX-2] equal the lower 32 bits of the int64 difference of STACK[STACK_INDEX-2] as int32 as the minuend and STACK[STACK_INDEX-1] as int32 as the subtrahend.
  2. Decrement STACK_INDEX.
  3. Skip to loop increment below.
  1. If EXPR[EXPR_INDEX] is 0xC0000003 (integer multiply):
  1. Let STACK[STACK_INDEX-2] equal the lower 32 bits of the int64 product of STACK[STACK_INDEX-2] as int32 and STACK[STACK_INDEX-1] as int32.
  2. Decrement STACK_INDEX.
  3. Skip to loop increment below.
  1. If EXPR[EXPR_INDEX] is 0xC0000004 (integer divide):
  1. Let STACK[STACK_INDEX-2] equal the lower 32 bits of the int64 integer quotient with STACK[STACK_INDEX-2] as int32 as the dividend and STACK[STACK_INDEX-1] as int32 as the divisor[44]. (This being the "integer quotient" means the result is an integer, e.g. "0x07 0x04 /" leaves 0x01 on the stack.) If the divisor is zero, the result must be zero.
  2. Decrement STACK_INDEX.
  3. Skip to loop increment below.
  1. If EXPR[EXPR_INDEX] is 0xC0010001 (float add):
  1. Let STACK[STACK_INDEX-2] equal the float32 sum of STACK[STACK_INDEX-2] as float32 and STACK[STACK_INDEX-1] as float32.
  2. Decrement STACK_INDEX.
  3. Skip to loop increment below.
  1. If EXPR[EXPR_INDEX] is 0xC0010002 (float subtract):
  1. Let STACK[STACK_INDEX-2] equal the float32 difference of STACK[STACK_INDEX-2] as float32 as the minuend and STACK[STACK_INDEX-1] as float32 as the subtrahend.
  2. Decrement STACK_INDEX.
  3. Skip to loop increment below.
  1. If EXPR[EXPR_INDEX] is 0xC0010003 (float multiply):
  1. Let STACK[STACK_INDEX-2] equal the float32 product of STACK[STACK_INDEX-2] as float32 and STACK[STACK_INDEX-1] as float32.
  2. Decrement STACK_INDEX.
  3. Skip to loop increment below.
  1. If EXPR[EXPR_INDEX] is 0xC0010004 (float divide):
  1. Let STACK[STACK_INDEX-2] equal the float32 quotient with STACK[STACK_INDEX-2] as float32 as the dividend and STACK[STACK_INDEX-1] as float32 as the divisor. If the divisor is zero, the result must be infinity (with the sign being positive if the dividend and divisor have the same sign, otherwise negative).
  2. Decrement STACK_INDEX.
  3. Skip to loop increment below.
  1. Otherwise, the operator has no effect; increment ERROR_COUNT and skip to loop increment below.
  1. If the 10 highest bits of EXPR[EXPR_INDEX] are all set, this is a zero-argument operator:
  1. If EXPR[EXPR_INDEX] is 0xFFC00000 (terminate):
  1. If there exists a value of i, where i is greater than EXPR_INDEX but less than 64, for which EXPR[i] is not 0xFFFFFFFF, then increment ERROR_COUNT.
  2. End the loop (skip to the after loop step).
  1. If the high 16 bits of EXPR[EXPR_INDEX] are 0xFFD0 (parameter reference):
  1. Let ARG be the lower 16 bits of EXPR[EXPR_INDEX].
  2. If ARG is greater than or equal to PARAM_COUNT, increment ERROR_COUNT.
  3. Let STACK[STACK_INDEX] equal the parameter with index ARG. (If ARG is greater than or equal to PARAM_COUNT, then this is zero.)
  4. Increment STACK_INDEX.
  5. Skip to loop increment below.
  1. If the high 16 bits of EXPR[EXPR_INDEX] are 0xFFE0 (expression reference):
  1. Let ARG be the lower 16 bits of EXPR[EXPR_INDEX].
  2. If ARG is greater than or equal to CURRENT_EXPR, increment ERROR_COUNT.
  3. Let STACK[STACK_INDEX] equal the value of the expression numbered ARG. (If ARG is greater than or equal to CURRENT_EXPR, then this is zero.)
  4. Increment STACK_INDEX.
  5. Skip to loop increment below.
  1. Otherwise, the operator has no effect; increment ERROR_COUNT and skip to loop increment below.
  1. Otherwise, the operator has no effect; increment ERROR_COUNT and skip to loop increment below.
  2. Loop increment: Increase EXPR_INDEX by one.
  3. If EXPR_INDEX equals 64, end the loop (skip to the after loop step).
  4. Jump to the top of the loop.
  1. After loop: If STACK_INDEX is zero:
  1. Let STACK[STACK_INDEX] equal zero.
  2. Increment STACK_INDEX.
  1. The expression's value is STACK[STACK_INDEX-1]. If ERROR_COUNT is non-zero, then the expression is invalid.

A WVG file must not contain any expression blocks which, when computed according to the steps above, are determined to be invalid. This primarily means that operators in valid files will always have the right number of values on the stack, that references will be to valid parameters and expressions, and that values after the terminate operator will all be 0xFFFFFFFF NaNs.

Later references to expressions with numbers EXPR_COUNT or greater are interpreted as references to the number zero.

Matrix blocks

Matrices are used by paints and compositions. In WVG all matrices are 4x4[bu].

Matrix blocks are the BLOCK_SIZES[MATRIX_BLOCKS] blocks starting with the block at BLOCK_OFFSETS[MATRIX_BLOCKS].

Each matrix block contains 4 matrices.

For convenience we define MATRIX_COUNT as BLOCK_SIZES[MATRIX_BLOCKS] * 4.

MATRIX_COUNT can in theory be as large as 234, but due to other limitations of this format, only 232 possible matrices are ever actually accessible (matrices are referenced using 32 bit values). For this reason, files should not specify a BLOCK_SIZES[MATRIX_BLOCKS] value greater than 230, and renderers can treat MATRIX_COUNT values greater than 230 as 230 without loss of generality (skipping over the "unreachable" expression blocks if BLOCK_SIZES[MATRIX_BLOCKS] is greater than 230). Of course a file with that many matrices would be over 17 gigabytes so this may be academic for a while yet.

We additionally define MATRIX[i] as the first matrix returned when following these steps:

  1. If i is greater than or equal to MATRIX_COUNT, return the identity matrix. It is static and valid.
  2. Let STATIC be true.
  3. Let VALID be true.
  4. Let CELL[j] be the word that is i*16+j words after the start of the BLOCK_OFFSETS[MATRIX_BLOCKS] block.
  5. Let RESULT be an empty 4-by-4 matrix with rows and columns numbered 0 to 3.
  6. For values of x from 0 to 3:
  1. For values of y from 0 to 3:
  1. Let VALUE be CELL[x * 4 + y]. (The matrix is stored in column-major order.)
  2. Let the value of cell of the RESULT matrix in column x and row y be the first value returned when following these substeps:
  1. If VALUE as float32 is a non-NaN value, then return VALUE as float32.
  2. Let ARG be the low 16 bits of VALUE.
  3. Let MODE be the high 16 bits of VALUE.
  4. If MODE is 0xFFD0, then let STATIC be false, let VALID be false if ARG is greater than or equal to PARAM_COUNT, and return the parameter with index ARG.
  5. If MODE is 0xFFE0, then let STATIC be false, let VALID be false if ARG is greater than or equal to EXPR_COUNT, and return the value of the expression numbered ARG[bv][bw][bx][by].
  6. If no value has yet been returned by these substeps, let VALID be false.
  7. Return VALUE as float32.
  1. Return the RESULT matrix. If STATIC is true, the matrix is static, otherwise it is dynamic. If VALID is true, the matrix is valid, otherwise it is invalid.

A matrix is either static or dynamic as defined by the steps above. (This indicates whether it depends on the parameters, and therefore whether an implementation might need to recompute it when the parameters are changed.)

A WVG file must not contain any matrix blocks such that the first matrix returned by MATRIX[i] is defined as invalid, for any value of i. As a result of these definitions, WVG file generators may substitute references to the identity matrix with references to matrices with an index greater than or equal to MATRIX_COUNT. It is suggested that for files whose MATRIX_COUNT is less than 232, the index 0xFFFFFFFF be used for the identity matrix.

Shapes

Shapes consist of a series of one or more curves. Each curve in a shape is anchored at (and leading away from) the end of the previous curve. (The first curve is implicitly anchored at, and leading away from, the origin.) A straight line called the closing line (not to be confused with a clothing line) is implied leading from the end of the last curve in a shape back to the origin.

WVG supports the expression of two kinds of curves: Cubic Béziers and Rational Quadratic Béziers. Cubic Béziers are defined by six numbers (the coordinates of two control points, plus the coordinate of the end point). Rational Quadratic Béziers use five numbers (the coordinates of the control point, the control point's weight, and the coordinates of the end point).

Cubic Béziers

Cubic Béziers are third-order Béziers curves where:

  • P0 = the end point of the previous curve, if any, or the origin otherwise
  • P1 = the point x1,y1 (first control point)
  • P2 = the point x2,y2 (second control point)
  • P3 = the point x3,y3 (end point of this curve)

Rational Quadratic Béziers

Rational Quadratic Béziers are second-order rational Béziers curves where:

  • P0 = the end point of the previous curve, if any, or the origin otherwise
  • P1 = the point x1,y1 (control point)
  • P2 = the point x2,y2 (end point of this curve)
  • w0 = the value w, the weight

Curve blocks[bz][ca][cb]

The numbers used for representing curves are stored in contiguous groups of contiguous blocks. Each group provides the data for up to 64 curves. The number of blocks per group (the group size) depends on the needs of the curves in that group.

Curve blocks are those starting with the block at BLOCK_OFFSETS[CURVE_BLOCKS].

Data is striped within a group so that each curve has data at the same index of each block in the contiguous blocks of that group. At each index there is either data for a cubic Bézier, a rational quadratic Bézier, or no curve at all. Within each group, the blocks have the following semantics:

Block

Cubic Béziers

Rational Quadratic Béziers

None

block 0:

x3 (end point)

x2 (end point)

0xFFFFFFFF

block 1:

y3 (end point)

y2 (end point)

0xFFFFFFFF

block 2:

x1 (first control point)

x1 (first control point)

0xFFFFFFFF

block 3:

y1 (first control point)

y1 (first control point)

0xFFFFFFFF

block 4:

x2 (second control point)

w (weight)

0xFFFFFFFF

block 5:

y2 (second control point)

0xFFFFFFFF

0xFFFFFFFF

blocks 6+:

0xFFFFFFFF

0xFFFFFFFF

0xFFFFFFFF

Cells in the table above labeled 0xFFFFFFFF indicate that the relevant word must have all 32 bits sets to indicate that the data is unused. Other cells indicate the semantic of the relevant word, those words must be stored and interpreted as float32 values. (0xFFFFFFFF, when interpreted as a float32, is a NaN.)

In other words, the first block of a group contains 64 x-coordinates of the end points of 64 curves, then the second block contains 64 y-coordinates of the end points of those 64 curves, and so forth. Curve types can be mixed, for example in the fifth block (block 4) of a group that has just two curves, one cubic and one quadratic, the first word would be the x-coordinate of the second control point of the cubic curve, the second word would be the weight of the quadratic, and the other 62 words would all be set to 0xFFFFFFFF.

In this version of WVG, a 0xFFFFFFFF value in block 6 (or a group with only 6 blocks) with a non-NaN value in block 5 indicates a cubic Bézier, and a 0xFFFFFFFF in block 5 (or a group with only 5 blocks) indicates that the curve is a rational quadratic Bézier. There is never a need for a renderer to recognize the "no curve" case. Future versions of this format may introduce other sentinel values to indicate other kinds of curves.

Blocks that are not present in a group (e.g. the seventh block, block 6, in a 6-block group) are implicitly full of 0xFFFFFFFF values (this is implemented in step 4 of the algorithm below). So for example if a shape refers to curves with a group size of 5, then implicitly all the curves will be rational quadratics (not cubics) because block 5 is implicitly always 0xFFFFFFFF in that set of curves.

While it would be highly unusual, there is nothing in this format that prevents blocks from being part of more than one group (for example, with one group using curve blocks 0 to 6 and another using curve blocks 3 to 9).

The curve with index i, group offset b, and group size g is defined as follows:

  1. Let GROUP be the integer component of i / 64.
  2. Let GROUP_OFFSET be BLOCK_OFFSETS[CURVE_BLOCKS] + b + GROUP * g.
  3. Assert: GROUP_OFFSET < BLOCK_OFFSETS[CURVE_BLOCKS+1]
  4. Let RAWCELL[j] be defined as 0xFFFFFFFF if j is greater than or equal to g, and the word with offset i % 64 in the block with offset GROUP_OFFSET + j otherwise. (Word offsets are measured in words from the start of their block, and block offsets are measured in blocks from the start of the file.)
  5. Let STATIC be true.
  6. Let VALID be true.
  7. Let CELL[j] be defined as the first value returned from following these substeps:
  1. If RAWCELL[j] is 0xFFFFFFFF, let VALID be false and return RAWCELL[j].
  2. If RAWCELL[j] is a non-NaN value when interpreted as float32, return RAWCELL[j].
  3. Let ARG be the low 16 bits of RAWCELL[j].
  4. Let MODE be the high 16 bits of RAWCELL[j].
  5. If MODE is 0xFFC0, then let STATIC be false, let VALID be false if ARG is greater than or equal to PARAM_COUNT, and return the parameter with index ARG.
  6. If MODE is 0xFFD0, then let STATIC be false, let VALID be false if ARG is greater than or equal to EXPR_COUNT, and return the value of the expression numbered ARG.
  7. If no value has yet been returned by these substeps, let VALID be false.
  8. Return RAWCELL[j] (a NaN value that is not 0xFFFFFFFF).
  1. If CELL[6] is 0xFFFFFFFF and CELL[5] is not a NaN value, then the curve is a cubic Bézier:
  1. x1 and y1 are CELL[2] and CELL[3] respectively, as float32.
  2. x2 and y2 are CELL[4] and CELL[5] respectively, as float32.
  3. x3 and y3 are CELL[0] and CELL[1] respectively, as float32.
  1. Otherwise if CELL[5] is 0xFFFFFFFF, then the curve is a rational quadratic Bézier:
  1. x1 and y1 are CELL[2] and CELL[3] respectively, as float32.
  2. x2 and y2 are CELL[0] and CELL[1] respectively, as float32.
  3. w is CELL[4], as float32.
  1. Otherwise, the curve is nothing. Let VALID be false.
  2. If STATIC is true, then this curve is dynamic. Otherwise, it is static.
  3. If VALID is true, then this curve is valid. Otherwise, it is invalid.

A curve is either static or dynamic as defined by the steps above. (This indicates whether it depends on the parameters, and therefore whether an implementation might need to recompute it when the parameters are changed.)

A curve is either valid or invalid as defined by the steps above. A WVG file must not contain any invalid curves.

A curve is either drawable or not. A curve is drawable if it is a cubic Bézier or a rational quadratic Bézier, and all of its parameters are finite (not NaN and not infinite). Whether a curve is drawable or not can vary based on the value of the parameters.

Assertion: If a curve is valid and static, it is drawable.

Shape blocks

Shape blocks are those starting with the block at BLOCK_OFFSETS[SHAPE_BLOCKS].

Each shape block describes 16 shapes, using four words each. Shapes are numbered. The shape with index SHAPE_INDEX starts at word BLOCK_OFFSETS[SHAPE_BLOCKS] + SHAPE_INDEX * 4 of the file.

For each shape:

  1. The first word is SHAPE_GROUP_OFFSET, a uint32 indicating the index of the first curve block for this shape.
  2. The second word is SHAPE_START_CURVE_INDEX, a uint32 indicating the index within that block for the first curve of this shape.
  3. The third word is SHAPE_CURVE_COUNT, a uint32 indicating the number of curves in this shape.[cc]
  4. The fourth word is SHAPE_GROUP_SIZE, a uint32 indicating the number of blocks per group for this shape.

A shape with no curves must have all four values set to zero.

All the shapes in shape blocks in a WVG file must meet all of the following conditions. If any of the following conditions are not met, then the shape is considered invalid.

  • SHAPE_GROUP_OFFSET must be less than BLOCK_SIZES[CURVE_BLOCKS].
  • SHAPE_START_CURVE_INDEX must be less than 64.
  • Let LENGTH be (SHAPE_START_CURVE_INDEX + SHAPE_CURVE_COUNT)/64, rounding up to the nearest integer if it is not an integral number. SHAPE_GROUP_OFFSET + LENGTH * SHAPE_GROUP_SIZE must be less than or equal to BLOCK_SIZES[CURVE_BLOCKS].
  • SHAPE_GROUP_SIZE must be greater than or equal to 5 and less than or equal to 64.

These conditions imply that all but the low six bits of SHAPE_START_CURVE_INDEX and SHAPE_GROUP_SIZE are unused (and will be zero)[45]. These bits could be used for future extensions of this format.

A renderer must treat an invalid shape as if it had no curves.

A shape's curves are the curves with index i, group offset SHAPE_GROUP_OFFSET, and group size SHAPE_GROUP_SIZE, where i is the integers greater than or equal to SHAPE_START_CURVE_INDEX and less than SHAPE_START_CURVE_INDEX+SHAPE_CURVE_COUNT-1, in ascending numerical order of i.

Every curve of a shape must be a valid curve.

A renderer must treat curves that are not drawable as if they were straight lines with zero length.

If a shape has one or more valid dynamic curves, then the shape itself is dynamic. Otherwise, the shape is static. An implementation can precompute static shapes; they will not change even when the file's parameters are changed at runtime. On the other hand, a dynamic shape might change when the parameters are updated.

For convenience, we define SHAPE_COUNT as BLOCK_SIZES[SHAPE_BLOCKS] * 16.

SHAPE_COUNT can in theory be as large as 236, but due to other limitations of this format, only 232 possible paints are ever actually accessible (paints are referenced using 32 bit values). For this reason, files should not specify a BLOCK_SIZES[SHAPE_BLOCKS] value greater than 228, and renderers can treat SHAPE_COUNT values greater than 232 as 232 without loss of generality (skipping over the "unreachable" parameter blocks if BLOCK_SIZES[SHAPE_BLOCKS] is greater than 228).

A renderer must treat a reference to a shape with an index greater than or equal to SHAPE_COUNT as if it had no curves.

Gradient[cd][ce][cf][cg] blocks

Gradient blocks describe the stop points and their colors for gradients defined in paint blocks.

Gradient blocks are the BLOCK_SIZES[GRADIENT_BLOCKS] blocks starting with the block at BLOCK_OFFSETS[GRADIENT_BLOCKS].

For convenience we define GRADIENT_COUNT as BLOCK_SIZES[GRADIENT_BLOCKS] / 2, rounded down if it is not an integer. (In other words, gradient blocks always come in pairs.) The number of gradient blocks in a file must be even. Renderers must ignore the last gradient block if the number of gradient blocks is odd.

Each pair of gradient blocks defines 2 to 64 stops and a matching number of colors. Stops are float32 numbers in the range 0..1. The first stop is always 0.0, the last stop is always 1.0, and each stop is greater than or equal to the previous stop. Colors are always references to parameters or expressions[ch][ci][cj].

We define GRADIENT[i] as the set of stops and corresponding colors yielded from following these steps until they are terminated:

  1. If i*2 is greater than or equal to GRADIENT_COUNT, then:
  1. Yield a 0.0 stop with the color 0x00000000 (fully transparent block).
  2. Yield a 1.0 stop with the color 0x00000000 (fully transparent block).
  3. Terminate these steps.
  1. Let STOP[j] be the word that is i*128+j words after the start of the BLOCK_OFFSETS[GRADIENT_BLOCKS] block.
  2. Let COLOR[j] be the word that is i*128+64+j words after the start of the BLOCK_OFFSETS[GRADIENT_BLOCKS] block.
  3. Let LAST_STOP be 0.0.
  4. Let COUNT be 0.
  5. Loop:
  1. If COUNT is 0, then let NEXT_STOP be 0.0.
  2. Otherwise:
  1. Let VALUE be STOP[COUNT].
  2. Let NEXT_STOP be the first value returned from following these steps, as float32:
  1. If VALUE is a non-NaN value when interpreted as float32, return VALUE.
  2. Let ARG be the low 16 bits of VALUE.
  3. Let MODE be the high 16 bits of VALUE.
  4. If MODE is 0xFFD0, then return the parameter with index ARG.
  5. If MODE is 0xFFE0, then return the value of the expression numbered ARG.
  6. Return VALUE.
  1. If LAST_STOP is less than 1.0 and NEXT_STOP is NaN or greater than 1.0, or, if COUNT is 63, then let NEXT_STOP be 1.0.
  1. If NEXT_STOP is less than LAST_STOP, greater than 1.0, or NaN, then terminate these steps.
  2. Let LAST_STOP be NEXT_STOP.
  3. Let VALUE be COLOR[COUNT].
  4. Let ARG be the low 16 bits of VALUE.
  5. Let MODE be the high 16 bits of VALUE.
  6. If MODE is 0xFFD0, then let NEXT_COLOR be the parameter with index ARG as color32.
  7. Otherwise, if MODE is 0xFFE0, then let NEXT_COLOR be the expression numbered ARG as color32.
  8. Otherwise, let NEXT_COLOR be zero as color32 (fully transparent black).
  9. Yield NEXT_STOP as float32 and NEXT_COLOR as color32.
  10. Increment COUNT.
  11. If COUNT is 64, terminate these steps.

Every even gradient block must fulfill the following criteria:

  • The first word in the block is zero.
  • There is exactly one n such that:
  • n is greater than or equal to 2.
  • n is not greater than 64.
  • The nth word as float32 is 1.0.
  • None of the first n words in the block are 0xFFFFFFFF.
  • All words in the block after the nth word (if any) are 0xFFFFFFFF.
  • Ignoring any words which, when interpreted as float32, are NaN values, none of the words in the block, when interpreted as float32, represent a value that is less than an earlier value in the block.
  • Each word that is not 0xFFFFFFFF but is a NaN when interpreted as float32 in the even gradient block must fulfill the following criteria:
  • The high sixteen bits of the word are either 0xFFD0 or 0xFFE0.
  • The low sixteen bits of a word whose high sixteen bits are 0xFFD0 are a number less than PARAM_COUNT.
  • The low sixteen bits of a word whose high sixteen bits are 0xFFE0 are a number less than EXPR_COUNT.
  • The word 64 words after each word that is not 0xFFFFFFFF in the even gradient block must fulfill the following criteria:
  • The high sixteen bits of the word are either 0xFFD0 or 0xFFE0.
  • The low sixteen bits of a word whose high sixteen bits are 0xFFD0 are a number less than PARAM_COUNT.
  • The low sixteen bits of a word whose high sixteen bits are 0xFFE0 are a number less than EXPR_COUNT.
  • Each word that is 0xFFFFFFFF must have a corresponding zero as the word 64 words later in the file (the corresponding color in the odd gradient block).

Paint blocks

Paint blocks are the BLOCK_SIZES[PAINT_BLOCKS] blocks starting with the block at BLOCK_OFFSETS[PAINT_BLOCKS].

Paint blocks use a varying number of words to describe the paint effect they represent. To allow for future expansion, each block represents a single effect[ck]. Composition blocks refer to paint blocks to describe how they should be styled.

The first word in each paint block describes the kind of effect represented by the block; the remaining words describe the parameters of the effect.

We define PAINT_COUNT as BLOCK_SIZES[PAINT_BLOCKS].

PAINT_COUNT can in theory be as large as 232, but due to other limitations of this format, only 65536 possible paints are ever actually accessible (paints are referenced using 16 bit values). For this reason, files should not specify a BLOCK_SIZES[PAINT_BLOCKS] value greater than 65536, and renderers can treat PAINT_COUNT values greater than 65536 as 65536 without loss of generality (skipping over the "unreachable" parameter blocks if BLOCK_SIZES[PAINT_BLOCKS] is greater than 65536).

The paint described by an index i is a paint that draws nothing if i is greater than or equal to PAINT_COUNT, otherwise, it is a the paint described by the block with offset BLOCK_OFFSETS[PAINT_BLOCKS] + i, as defined by the section that corresponds to the first word of that block, as per the following table and the following subsections.

First word

Effect

0x00000010

Linear gradient

0x00000014

Radial gradient

Anything else[cl][cm][cn][co]

A paint that paints nothing

Paint blocks must not start with a word that does not have a corresponding section below[cp].

Flat color

No paint blocks describe a flat color paint; to describe a flat color, a flat color paint code or a paint code that references a parameter or expression is used instead.

Linear gradient

A paint block whose first word is 0x00000010 is a linear gradient.

The words of such a block must be interpreted as follows:

Word

Interpretation

word 0

must be 0x00000010, signature for linear gradient

word 1

index of gradient to use, GRADIENT_INDEX, as uint32

word 2

flags, FLAGS, as uint32

word 3

index of matrix to use, MATRIX_INDEX, as uint32

words 4+

must be zero, must be ignored

The block represents a paint that draws a linear gradient that interpolates using the stops and the colors of GRADIENT[GRADIENT_INDEX] from the origin to the coordinate 1.0,0.0, with the flags FLAGS, transform by MATRIX[MATRIX_INDEX].

A matrix is used for linear gradients (rather than just specifying two coordinates, which for linear gradients is equivalent and would be simpler) so that gradients can be adjusted by parameters without requiring implementations to have logic for expanding paints.

Radial gradient

A paint block whose first word is 0x00000014 is a radial gradient.

The words of such a block must be interpreted as follows:

Word

Interpretation

word 0

must be 0x00000014, signature for radial gradient

word 1

index of gradient to use, GRADIENT_INDEX, as uint32

word 2

flags, FLAGS, as uint32

word 3

index of matrix to use, MATRIX_INDEX, as uint32

words 4+

must be zero, must be ignored

The block represents a paint that draws a radial gradient that interpolates using the stops and the colors of GRADIENT[GRADIENT_INDEX] from the origin to the unit circle, with the flags FLAGS, transformed by MATRIX[MATRIX_INDEX].

Flags

The flags of a gradient are bits in a uint32. The bottom two bits must be interpreted as follows:

Bits

Interpretation

0x0

Samples beyond the edge must be clamped to the nearest color in the defined inner area.

0x1

Samples beyond the edge must be repeated from the far end of the defined area.

0x2

Samples beyond the edge must be mirrored back and forth across the defined area.

0x3

Samples beyond the edge must be treated as transparent black.

Paint codes[cq]

A paint code consists of two words. The paint for a paint code whose two words are OPERATOR and COLOR is defined as the first paint that is returned from the following steps:

  1. If OPERATOR is 0xFFFFFFFF, return a paint that draws with the flat color COLOR interpreted as color32. The paint code is valid.
  2. Let ARG be the low 16 bits of OPERATOR.
  3. Let MODE be the high 16 bits of OPERATOR.
  4. If MODE is 0xFFD0, then return a paint that draws with a flat color, that color being the parameter with index ARG as color32. In this case, if ARG is less than PARAM_COUNT and COLOR is zero then the paint code is valid, otherwise it is not.
  5. If MODE is 0xFFE0, then return a paint that draws with a flat color, that color being expression numbered ARG as color32. In this case, if ARG is less than EXPR_COUNT and COLOR is zero then the paint code is valid, otherwise it is not.
  6. If MODE is 0xFFF0, then return the paint described by the paint block with index ARG. In this case, if ARG is less than PAINT_COUNT and COLOR is zero then the paint code is valid, otherwise it is not.
  7. Return a paint that draws nothing. In this case, the paint code is not valid.

A paint code can be valid or not, as determined by these steps.

Composition blocks[cr]

Composition blocks are the BLOCK_SIZES[COMP_BLOCKS] blocks starting with the block at BLOCK_OFFSETS[COMP_BLOCKS].

Composition blocks represent actual rendering. Each block specifies a group of matrices and shapes, and a paint. [cs][ct][cu][cv][cw]Specifically, the words in a composition block are as follows:

  1. The MATRIX_INDEX, as uint32.
  2. The SHAPE_INDEX, as uint32.
  3. The SEQUENCE_LENGTH, as uint32.
  4. The OPERATOR, as uint32.
  5. The COLOR, as color32.[cx]

All remaining words must be zero.

The SEQUENCE_LENGTH is biased by one, meaning that a value of zero indicates there is one shape in the composition, a value of one indicates two shapes, etc.

For each composition block, a path must be created as per the following steps:

  1. Let i be zero.
  2. Loop:
  1. Let SHAPE be the shape with index SHAPE_INDEX + i.[cy][cz][da]
  2. Let MATRIX be the matrix with index MATRIX_INDEX + i.
  3. Transform SHAPE by MATRIX to form PATH_COMPONENT.
  4. Add PATH_COMPONENT to the path being created.
  5. Increment i.
  6. If i is greater than or equal to SEQUENCE_LENGTH, terminate these steps.

The interior of the path (which is the area that is painted, as described below) is defined by a non-zero sum of signed edge crossings: for a given point, the point is considered to be on the inside of the path if a line drawn from the point to infinity crosses curves going clockwise around the point a different number of times than it crosses curves going counter-clockwise around that point.

Composition blocks must be composited[db][dc], in the order specified in the file. For each composition, the path must be filled as specified by the paint with the paint code formed by OPERATOR and COLOR, into the rectangle whose top left is at 0,0 and whose bottom right is at IMAGE_WIDTH,IMAGE_HEIGHT. Each composition block's pixels must be combined with the previous composition's using the "over" Porter-Duff operator[dd].

Composition blocks must not have a MATRIX_INDEX greater than or equal to MATRIX_COUNT, a SHAPE_INDEX greater than or equal to SHAPE_COUNT, a SHAPE_SEQUENCE_LENGTH greater than or equal to SHAPE_COUNT-SHAPE_INDEX or MATRIX_COUNT-MATRIX_INDEX, or a paint code (consisting of OPERATOR and COLOR) that is not valid.

APIs

Implementations should offer the following APIs for introspecting images.

Updating parameters

Given a parameter index that is equal to or greater than zero and less than PARAM_COUNT, as well as a parameter value in the form of a 32 bit integer (signed or unsigned), 32 bit color value, or 32 bit floating point value (in binary32 format), the parameter updating API must update the value of the parameter specified by that index to the given new parameter value, and then schedule the image to be rerendered at the earliest available and appropriate opportunity.

Hit testing

Given a point in the image's coordinate space (as given by the width and height in the metadata block, or implied by such a block's absence), the hit testing API must return the index of the top-most composition that describes a path that considers the given point to be within its interior, or a sentinel value (such as -1 or null) if the point is not within the interior of any of the paths.

For example, a suitable Dart API could have the following signature:

int? hitTest(Offset position);

Bounds introspection

Given an index that specifies a composition block (the index being greater than or equal to zero, and less than BLOCK_SIZES[COMP_BLOCKS]), the bounds introspection API must return an axis-aligned rectangle (aligned to the x and y axes of the image) giving the smallest rectangle that contains all points that are considered to be in the interior of the path of that composition block (the bounding box of that path).

For example, a suitable Dart API could have the following signature:

        Rect bounds(int composition);

Implementations should fail (e.g. throw an exception) if the specified composition index is out of range.

Implementations should also fail (e.g. throw an exception) if the specified composition block has no curves.

Metadata APIs

The width API must return the image's width as specified by the metadata block, or 1.0 if there is no metadata block.

The height API must return the image's height as specified by the metadata block, or 1.0 if there is no metadata block.

Other APIs

Implementations may offer other affordances, e.g. providing a count of parameters or composition blocks, exposing the default or current values of parameters, or offering APIs to update parameters continually (e.g. specifying that a particular parameter's value should be increased by a specific amount every 16ms).

Compressibility

This section is non-normative.

WVG files are somewhat sparse, have a lot of redundancy, and are extremely regular, which makes them interesting targets for compressibility.

In practice, simple WVG files compress by a factor of 10, in some cases a factor of 20. Anecdotally, based on the very few sample files at this early stage, xz (which uses LZMA2) performs best among commonly-available compression tools, compressing the 35,328 byte test data to 2,220 bytes:

Summary of internal references

This section is non-normative.

Expression blocks, matrix blocks, curve blocks, and gradient blocks can contain words of the form 0xFFD0XXXX to refer to parameters and words of the form 0xFFE0YYYY to refer to earlier expressions, where XXXX is the parameter index (the XXXXth word of the file starting from the first parameter), and YYYY is the expression index (the YYYYth block of the file starting from the first expression block).

Shape blocks specify curves by giving the number of the shape block that contains the first coordinate of the curve in question, and the index of that coordinate's word in that block (as well as the number of blocks in the groups).

Paint blocks refer to gradients by specifying the gradient index (which is half the number of the block that the gradient starts from, relative to the first gradient block) and matrices by specifying the matrix index (which is the number of words from the first word of the first matrix block to the first word of the matrix being specified, divided by sixteen).

Composition blocks refer to matrices in the same manner as paint blocks, shapes by the number of words from the first word of the first shape block to the first word of the shape being specified divided by four, and colors by either using 0xFFD0XXXX to refer to parameters, 0xFFE0YYYY to refer to expressions, and 0xFFF0ZZZZ to refer to paints, where XXXX and YYYY are interpreted as above and ZZZZ is the number of the paint block being referenced relative to the first paint block.

In summary, parameters, expressions, curves, gradients, matrices, shapes, and paints can be referenced. The method for referencing a feature of the format is the same any time that it can be referenced (so e.g. every reference to a parameter is always of the form 0xFFD0XXXX).

Future extensions[de]

When adding features to this format in the future, various options are available. Here are some thoughts that may help.

New block types

Obviously the simplest extension mechanism is adding new block types. To make these block types relevant, they would need to be referenced from a source, e.g.

Bitmap images and other attachments

Arbitrary data from other formats can be embedded in a new block type without additional internal structure, with parts identified by start offset and length. Alternatively, two block types could be used, one containing raw unstructured binary data (PNGs, JPEGs, etc), and the other providing a directory index, or manifest, of the data in the other blocks, identifying entries by name, type, offset, and length (and potentially including even more data such as modification times). Each file would use one block of the manifest, and specific files could then be identified in other parts of the format (e.g. in paints) by indexing into the manifest.

Extending packed blocks

Some block types, e.g. matrices and parameters, are packed tightly, with no room for expansion. To add new information to such blocks, a new parallel block type can be minted, and indices into the original type can simultaneously refer to the additional data in the new block. Thus for example a reference to matrix 4 would refer simultaneously to the fourth group of 16 words in the first block of the MATRIX_BLOCKS blocks, as well as the fourth group of 16 words in the first block of the MATRIX2_BLOCKS blocks, where MATRIX2_BLOCKS is a different block type (e.g. 24, one more than MATRIX_BLOCKS itself).

More parameters, expressions, and paints

The format currently limits the number of parameters, expressions, and paints to 65,536 (216), because it uses 16 bits to specify the index. This makes the implementations very slightly simpler by splitting the 32 bit words that reference parameters, expressions, and paints into two 16 bit parts that together always form a floating point NaN value. This allows the high 16 bits to always be compared literally (and always to the same magic constants, even though the context doesn't always require NaN-boxing), and the low 16 bits to be used literally as the index.

However, the low 4 bits of the high 16 bits are always zero in the current scheme. These bits could be used to extend the references with only a slight increase in complexity, allowing up to 1,048,576 (220) parameters, expressions, and paints per file.

More kinds of references

The remaining 12 bits are the NaN sign bit, the eight exponent bits which must all be set to indicate a NaN value, and three bits of the mantissa (the NaN payload). The mantissa cannot be all zeroes. This leaves the following ways to bundle data into the NaNs:

High 12 bits

Range of low 20 bits[46]

Current assigned meaning

0b011111111000

0x7F8

0x00001-0xFFFFF

Not currently assigned

0b011111111001

0x7F9

0x00000-0xFFFFF

Not currently assigned

0b011111111010

0x7FA

0x00000-0xFFFFF

Not currently assigned

0b011111111011

0x7FB

0x00000-0xFFFFF

Not currently assigned

0b011111111100

0x7FC

0x00000-0xFFFFF

Not currently assigned

0b011111111101

0x7FD

0x00000-0xFFFFF

Not currently assigned

0b011111111110

0x7FE

0x00000-0xFFFFF

Not currently assigned

0b011111111111

0x7FF

0x00000-0xFFFFF

Not currently assigned

0b111111111000

0xFF8

0x00001-0xFFFFF

Not currently assigned

0b111111111001

0xFF9

0x00000-0xFFFFF

Not currently assigned

0b111111111010

0xFFA

0x00000-0xFFFFF

Not currently assigned

0b111111111011

0xFFB

0x00000-0xFFFFF

Not currently assigned

0b111111111100

0xFFC

0x00000-0xFFFFF

Not currently assigned

0b111111111101

0xFFD

0x00000-0xFFFFF

Parameter reference

0b111111111110

0xFFE

0x00000-0xFFFFF

Expression reference

0b111111111111

0xFFF

0x00000-0xFFFFF

Paint reference

Ranges could be combined if more bits are needed, for example rows 0x7FC to 0x7FF could be combined to store a 22 bit number in the remaining bits, if 20 bits is insufficient for some payload.

In expression blocks, references must start with a leading one bit (so that all positive integers can be pushed onto the stack), so the rows above starting with 0x7F are only useful for references that are not meaningful in expressions.[df]


[1] https://github.com/flutter/flutter/issues/1831

[2] https://material.io/design/iconography/animated-icons.html

[3] This section does not apply to simpler kinds of maps like floor plans and topological maps.

[4] Specification: https://www.w3.org/TR/SVG/

[5] Specification: https://www.iso.org/obp/ui/#iso:std:63534:en

[6] There are many such formats, for example Skia has the SKP format. Like other proprietary formats, it has limitations that make it unsuitable for this document's purposes.

[7] NeXT used DPS for its vector graphics. Mac OS X, now macOS, originally a fork of NeXT, switched to a subset of PDF for its vector graphics.

[8] https://airbnb.io/lottie/

[9] Usually in the context of exposing Skia's Skottie module.

[10] This may eventually change, q.v. https://github.com/lottie-animation-community

[11] https://github.com/googlefonts/colr-gradients-spec/blob/main/OFF_AMD2_WD.md

[12] https://rive.app/

[13] This may eventually change, q.v. https://help.rive.app/runtimes/advanced_topics/format

[14] https://github.com/google/iconvg

[15] q.v. https://github.com/google/iconvg/issues/4#issuecomment-860649783

[16] https://tinyvg.tech/

[17] It's tempting to list other authoring tools here, such as Inkscape or Affinity Designer, but the reality appears to be that this market has only one major player, with other vector graphic authoring tools having minimal usage in comparison. If export from Adobe Illustrator is supported, it is probably sufficient to ensure the format's viability from a designer perspective; on the other hand, even if ten other tools were to support export to this format, it may not be enough to matter.

[18] For example, the ability to hand-author SVG was key to creating the original set of consistent icons on the Flutter widget catalog page: https://flutter.dev/docs/reference/widgets

[19] That said, it doesn't appear that SVG's original design was intended to be optimized for hand-authoring, and hand-authoring SVG is not an overly pleasant experience. See also: https://www.w3.org/Graphics/SVG/WG/wiki/Secret_Origin_of_SVG

[20] Tools that convert SVG to other formats suffer from this issue, and therefore uniformly only support a subset of SVG's features, though the precise subset varies from tool to tool and can be hard to precisely describe.

[21] For example, Inkscape uses proprietary extensions to SVG to describe editing state (see https://inkscape.org/learn/faq/#what-inkscape-svg-opposed-plain-svg), and Adobe Illustrator uses a variant of EPS.

[22] For example, Corel Draw has had a variety of file formats over the years, all proprietary.

[23] Even those who consider using these features to make SVG-based apps accessible often find it difficult. A full solution really requires combining ARIA and SVG and scripts dedicated to updating the ARIA attributes, but that's for "images" beyond the scope of this document (applications, really, for which we would propose using Flutter itself, not whatever format is designed for this document).

[24] Consider, e.g., the sample images for SVG provided by the W3C (warning, some sexist content): https://dev.w3.org/SVG/tools/svgweb/samples/svg-files/?C=S;O=D

[25] https://master-api.flutter.dev/flutter/dart-ui/instantiateImageCodec.html

[26] Obviously not all of this would be accessible to the vector graphics renderer...

[27] See also: A strategy for making judgements regarding space/time trade-offs (PUBLICLY SHARED)

[28] "Build the best way to develop user interfaces", with the corollary being "The best way to develop user interfaces creates fast applications". See: https://github.com/flutter/flutter/wiki/Values

[29] For example, galleries of animated images.

[30] See also https://en.wikipedia.org/wiki/Font_hinting

[31] As noted earlier, accessibility in the platform as a whole is critical. Accessibility being low on this list reflects that the needs can be met outside the format as well, and supporting them inside the format would be beneficial only to the extent that it provides greater flexibility to designers.

[32] Much has been written on the complexities of text layout, shaping, et al. This blog post provides an interesting introduction to the topic.

[33] An approximation because this measurement cannot be exact if the image is transformed in any way beyond a scale transform.

[34] An approximation because if the image is being rendered with a non-uniform scale or non-affine transform, the precise number of hardware pixels per coordinate system pixel may vary based on the axis or location in the image.

[35] This is not strictly speaking true; a stroke may require a curve of higher order to be precisely represented. But in practice you can always approximate it with sufficient precision for this to be true enough.

[36] Implementations typically support data: URLs too, enabling a super-inefficient inline encoding of bitmap data.

[37] Humans have a long history of accidentally making things Turing complete, e.g. with C++ templates and other type systems, various card games and video games, even the x86 MOV instruction alone.

[38] This section is based on discussions with the Skia and Spinel teams.

[39] This is a vastly simplified example that is intended to convey the general truth rather than conveying an accurate reality. In practice, there are many ways to draw circles, and the details may vary from GPU to GPU, from circle to circle, and over time as new algorithms are discovered.[dg][dh][di][dj][dk]

[40] See also GPU-Centered Font Rendering Directly from Glyph Outlines (Lengyel 2017), Resolution Independent Curve Rendering using Programmable Graphics Hardware (Loop, Blinn; 2005), and piet-gpu.

[41] More or less. This is documented in the SWF spec on pages 128-129, and in some blog posts.

[42] Taking bets on how long it'll take to accidentally violate this design goal. Anyone? Anyone?

[43] Recall that the format is column-major, which is why the 24.0 value, 0x41c00000, is the 13th entry in the matrix data, not the 4th.

[44] Performing this operation entirely in the 32 bit domain would allow overflow in the case where STACK[STACK_INDEX-2] is -231 and STACK[STACK_INDEX-1] is -1.

[45] Really all but one bit of SHAPE_GROUP_SIZE is unused right now, if we're honest.

[46] The two rows where the other bits of the mantissa are zeroes must have a non-zero payload, so the low 20 bits cannot represent the number 0x00000. Every other row can encode any 20 bit number.

[a]Being able to compile from Lottie => IconVG with animation would be a very attractive feature

[b]Agreed

[c]I wonder if file-wide fixed-alignment is as important as it used to be? - https://github.com/be-fonts/boring-expansion-spec/issues/1

[d]The goal of this alignment is to enable direct memory copy from mmap'ed files into GPU memory and to make the format simple to process, not so much to address the kinds of performance issues that led to alignment in the past.

I don't have good objective data on the tradeoffs here.

[e]domenic suggests having a high-level description of the abstract model, too (something that would cover wvg and wvgtxt equally well)

[f]This seems wasteful both in terms of space and processing power. This means that a straight line will need additional unnecessary control points, and either the parser or the renderer will have to do extra calculations when drawing a curve that's really not curved.

[g]I certainly would want to confirm my hypothesis with numbers, but my theory here is that when you implement this in a shader, it's more efficient to have more uniform logic (fewer branches). Ideally I would like to just implement everything using rational cubics, but the math there is a lot more complicated than non-rational cubics and rational quadratics.

[h]I bet @csmartdalton@google.com  has some experience in this domain :) Chris, what do you think?

[i]>> I certainly would want to confirm my hypothesis with numbers, but my theory here is that when you implement this in a shader, it's more efficient to have more uniform logic (fewer branches).

Skia tessellation is an existence proof of your proposal. We convert lines to cubics and it's very fast. The tessellator ends up giving them one segment, so the GPU just sees a single triangle anyway.

On the CPU side we still branch based on whether it's a line, quad, cubic, or conic though.

[j]@csmartdalton@google.com Do you have any thoughts on the format proposed here? (Using cubics and conics exclusively, for example; as well as the actual storage format: does it seem reasonable if the goal is to maximize the efficiency of rendering these images purely in shaders?)

[k]I'm a bit curious about defining blocks up front.

It seems like at best it will require array lookups that don't benefit from cache locality. 

Protos and Flatbuffers, for example, define the types as they go, so parsers can just stream.

Another thing to consider: should the format just be loadable in memory as its own C struct, similar to how FlatBuffers deals with loading things?

[l]My goal was to have the data be in a form that could be memcpy'ed straight to the GPU, allowing random-access to the file without parsing it (e.g. if you want the curves you can efficiently compute exactly where they are from the header, without having to examine any of the bytes that aren't relevant)

[m]Ahh I see, so you can just read the header and then bulk-send blocks over to a shader program that will use them?

[n]that's the theory, but it remains to be proved.

[o]This sample suggests the format is very space inefficient. Even it it can be compressed in transit, if the idea is to mmap the whole thing and copy it to the GPU, that's presumably the decompressed form, and using lots of memory (even mmaped memory) is not good. It's not obvious that the dramatic space inefficiency is necessary given the format's stated goals.

[p]Yeah the current version is definitely less space efficient than it could be without sacrificing the goals. There's some notes below about improvements we should make if we move forward with this basic design (making the paints and compositions take 8 and 4 words respectively instead of 64, for example).

[q](That said, the memory footprint is not high on the list of priorities; I'm curious to hear if you would rearrange that list to move it higher.)

[r]I don't know about the Flutter use case, but for the web, memory use would matter a great deal.

[s]Well the question isn't whether it's important, it's whether it's more or less important than other considerations. For example, would you put memory usage above or below rendering performance in terms of what to prioritize? If we could make a vector graphics format that cost 12KB of RAM max, but images took 500ms to render, would that be an interesting format? What about one where each file always took ten megabytes of main memory and ten more megabytes in GPU RAM to render, but images could always reliably be rendered in 0.1ms, would that be an interesting format?

[t]Efficiency and memory use are both important. I think you presented two extremes but it's not obvious to me that the tradeoff is this stark.

Indeed, since modern memory architectures are hihgly dependent on cache, it's likely that gratuituous memory inefficiency also leads to speed inefficiency. And at the extreme (e.g. a document containing many images) this could lead to swapping (bad for system efficiency) or even impossibility of rendering.

[u]A significant portion of this file is just zero padding to satisfy the way the blocks are defined and laid out - what does that end up looking like on a large file with lots of painting instructions that are mostly relatively simple?

[v]Looks like lots of zeroes. It compresses well. :-)

File size is not a high priority (see the design decision summary above).

[w]But then CPU time has to be spent decompressing... And the decompressed data takes a lot of memory.

[x]Well the data is going to be compressed anyway (e.g. in an APK). I wouldn't compress individual files.

[y]It will have to get decompressed though, and particularly at runtime it will occupy valuable memory.

 

This can easily become a problem if you have a lot of VGs to draw in a list that you want to pre-load for smooth scrolling, and all of them end up with lots of extra padding.

[z]I'd like to experiment with this some. Do you have any images that would be good test cases here? (Ideally something that is defined using only filled SVG paths, since that's what my current toolset most conveniently allows me to convert to WVG heh.)

[aa]I don't have a specific drawing in mind, but I'd say testing with lots of drawings that use the least amount of space possible for each type but actually use each type - e.g. have one transform or five transforms, etc.

[ab]Certainly if we go anywhere with this we will have to test it comprehensively.

[ac]> the data is going to be compressed anyway

Is something like `<img src="data:image/wvg;base64,blahblahblah" width="64" height="64">` out of scope?

[ad]You could probably achieve what you want if you included section lengths along with types. You could then avoid all the zero padding, and if your structures are aligned anyway it should be pretty trivial to memcpy them using the given length rather than an implied one.

[ae]nigel: it's certainly not something this is optimized for. i think if you're going to inline images into HTML, then you might as well use SVG: it's more idiomatic, it's already supported, and clearly performance is not the overriding concern if you're inlining base64 into HTML.

dan: you mean, dropping the 64-word alignment? that's certainly possible. it loses some of the simplicity. if the only goal is avoiding the padding, i'm not sure it's worth it. we can remove most of the padding by re-structuring some of the blocks (e.g. making PAINT and COMP blocks 8 or 4 words long instead of 64), if the padding is an issue.

[af]IME the vast majority of transformations done are simple affine transformations. I know file size is not a goal of this format, but if that changes it would be an easy place to save some size, especially in places where the renderer would drop this down to a 3x3 matrix anyway.

[ag]Yeah if we want to optimize for file size there's a ton of things I'd do differently here. I think the approach used by IconVG is more along the lines of what you'd want to do if that was the goal.

[ah]I get that 4x4 is a nice power of 2 but just noting that while 3x3 doesn't save anything (if you're aligning to 16 words instead of 8), 3x2 affine matrices can pack in 8 word alignment (with some padding).

[ai]While I agree that egregiously wasteful blocks are worth optimizing (e.g. COMP blocks today, which only use 4 or 5 of the 64 words they consume, and which are expected to be present in large numbers in even mildly complicated images), if we are at the point of optimizing 4x4 matrices into 3x2 matrices then we are really saying that file size is a much higher priority and in that case the entire format should be redesigned.

 

I have no problem with file size being a high priority, but that's a different format.

[aj](or more to the point, it's a discussion we should have in the section higher up on what the priorities of the format should be.)

[ak]sRGB

[al]Is there currently data on whether RGBA or BGRA (or other) formats are better supported natively? (i.e. more efficient) I have found multiple pointers online that iOS seems to prefer BGRA for example.

[am]my impression was that the convention in shaders is to use RGBA, but I don't have a strong opinion one way or the other here. I suspect we'll just change the spec to match whatever the first shader implementation finds is most convenient.

[an]Make the signature non-ASCII (and non-UTF-8) so that a 512-byte text file coincidentally starting with "WVG\n" isn't misinterpreted as a WVG file?

[ao]the requirement that the next 63 words exactly sum to the length of the file is intended to provide that safety.

It's true that that would be a bad signature in the mimesniff.spec.whatwg.org sense, though (too expensive to compute quickly). Maybe rather than 0x0A in the fourth byte I can put something like 0xFF (non-ASCII and non-UTF8).

[ap]Consider re-numbering the block types to be 1..64 instead of 0..63:

METADATA_BLOCKS=1

PARAM_BLOCKS=8

etc

so that the i'th word in the file (starting counting from 0) corresponds to block type i.

0 is an invalid block type.

[aq]yeah that would probably simplify things significantly. (type 0 can be thought of as the header in that world). Good idea.

[ar]Consider putting this prefix sum as header block wire format value? The i'th word (in the file) contains the number of blocks whose type is *less than i*, not *equal to i*. Make the 63rd word the overall file size (measured in blocks).

It does mean that validity checking requires checking that the words are non-decreasing. But validity checking might otherwise involve adding the words anway, to check for uint32 overflow.

[as]I prefer the current approach because it is less prone to misimplementation. In the current setup, an implementation that can render files is highly unlikely to have a bug in the header parser (the one exception being ignoring unknown blocks, but that's why check.wvg has unknown blocks in it). In the version with the sums in the header, an implementation that doesn't verify the values will render valid files correctly, but may also render misgenerated (invalid) files in a way that is intended, which would eventually lead to the format needing to take those bugs into account, which would lead to the format being more complex.

[at]Why floats for width/height? 

It seems more natural to think of these as ints, and then scale if needed. It may also get confusing if subpixel rendering is not possible on a given platform.

[au]I think good arguments can be made in either direction. I think in practice we could get away with making both of these just be a single number, the aspect ratio. I went with floats because all the other coordinates are floats, but I don't feel strongly.

[av]I was just thinking about this... and to illustrate just how much I'm not sure it actually matters: I was actually thinking that I had specified this as ints and that you had suggested it should be floats, and it seemed to make perfect sense to me that I would have specified this as ints but also seemed reasonable to want to change it to floats.

I wonder how much we lose if we specify just an aspect ratio and not a width or height.

[aw]You lose the ability for the graphic to say what its intended dimensions were at design time. For example, if you want to replace an image with one of these, and that image used to know it was 52x64, it doesn't know that anymore.

[ax]Yeah but that's almost as much an advantage as a disadvantage, right. I mean, how many times has the fact that an image has a built-in size been used as a crutch and then the user experience is bad? e.g. because images pop in and resize everything around them.

[ay]Agreed - but that's all I can think of for what gets lost

[az]what if it is zero or negative

[ba]baseline alignment?

[bb]IMHO this shouldn't be supported. This ends up allowing graphics to use more memory than it seems like they should on rasterization, and it ends up being confusing - particularly when the same graphic shows up different ways on renderers that decide to support or not support overflow.

[bc]But I'm confused about how this doesn't contradict the next paragraph :)

[bd]To your second point: This paragraph is a conformance requirement on the renderers, the next one is a conformance requirement on the files. They are different conformance classes.

To your first point: Are you saying we should require that renderers _not_ clip?

[be]I think they should just clip.

[bf]That's probably the right choice, I just wish that we could avoid even that cost heh.

[bg]Maybe make it an error for the format to draw outside the clip and expect encoders to check it?

[bh]That's what "WVG files must not contain any compositions that would render pixels outside of the rectangle whose top left is at the origin and whose bottom right is at IMAGE_WIDTH,IMAGE_HEIGHT" means. :-)

[bi]I would find it helpful if these paragraphs were reversed, or maybe just the current second one said something about it being undefined behavior as far as the renderer is concerned.

[bj]alright i tried to improve this.

it's possible i should just bite the bullet and make it required to clip and not worry about the few cycles we're talking about here.

[bk]_Marked as resolved_

[bl]_Re-opened_

I would strongly suggest that. Not allowing geometry outside the implicit clip of the image puts even more burden on the creation tool, and in some scenarios it's going to be really complex. Consider the trivial animation of a circle expanding to fill a rectangular region... Actually - is scale even allowed as a parameter? Is it possible to animate a growing circle?

[bm]I've loosened the text to be a SHOULD-level requirement for avoiding it in encoders, and SHOULD-level requirement for clipping in decoders.

Shapes are always combined with an arbitrary 4x4 transform (which itself can reference expressions and parameters) before rendering, so yes.

[bn]I don't think this can be supported performantly.

This means it will be super easy to not be able to pre-compile shaders for these graphics, since these values may be dynamically derived at runtime. This will mean shader compilation overhead no matter what.

Unless the renderer is itself implemented as a shader program that can interpret these things quickly.

[bo]Implementing the renderer as a (set of) shader(s) is the long-term plan, yes (and is why the format is largely blocks of words).

 

That said, I agree, I think this is the weakest part of the format. I'm not sure how else to do multiple-axis animations, custom palettes, etc, though, which are features I really would like to support.

[bp](It's worth noting that you can statically determine which matrices, gradients, etc, are precomputable, and thus cacheable, and which require recomputation in the face of parameter changes)

[bq]Should this be upgraded to a "must"? Should it be an error to have a file with BLOCK_SIZES[PARAM_COUNT] > 1024?

If there's any context at all where it's possible to express a parameter reference with an index of more than 16 bits (now or in the future), this "should" will open the door for interoperability issues where an image will have a 65537th parameter and it works in some renderers but not others - and those files won't even be "invalid" according to the spec.

[br]In practice "should" or "must" here doesn't make a difference. A file in today's format can't have a 65537th parameter referenced in a future renderer, since there's no mechanism to reference that parameter in today's format. A future file with a 65537th parameter that is referenced by a feature in a future version of the format would have that parameter and that reference regardless of whether we make it invalid in this version of the format. As you say, that file would work in future renderers and not earlier renderers, but that's going to be the case for any feature we add later.

[bs]You might want some trigonometric functions at some point (e.g. convert a time parameter into a rotation angle into an affine transform matrix) but it looks straightforward to add new opcodes.

[bt]definitely, this set of opcodes is just a placeholder.

[bu]Since curves are described by 2D points, then doesn't make sense to store 4x4 matrices since they contain redundant information. 4x4 transformation matrices can be reduced to 3x3 transformation matrices by solving a system of equations where the z axis becomes irrelevant.

[bv]You might also want "matrix multiply" functions at some point, which could be tricky if expressions work in terms of single words. You can, of course, explicitly spell out "dst_ij = (a_i0 * b_0j) + ... + (a_i3 * b_3j)" 16 times, but that's laborious and possibly noticable in terms of file size cost.

[bw]interesting.

a future extension might be a new kind of expression block that works in terms of matrices instead of words. Then, places that refer to matrices could refer to those kinds of expressions.

[bx]we'd need to reduce MATRIX_COUNT to 2^31 to gain a bit that specifies "is an expression"...

[by](the other option, as you say, is that a matrix can be defined where each word points to an expression and that expression is the computed value for that cell of the matrix... that might just be enough in general?).

[bz]I have been recently refactoring the way we store curves in mold (CPU renderer) in order to improve performance and one interesting factor is separating point data from curve-type data. Also, separating x and y coordinates opens the door to efficient per-shape transforms: for all x/y coordinates of a shape execute 2 SIMD-wide FMAs, potentially in parallel on multiple cores. This is mostly relevant in cases where you have many points in one path (one can imagine particles exported from a tool, all with the same colours, being aggregated into the same shape/path).

[ca]How would you apply this feedback to this format?

[cb]I'll want to think a little bit more about this and reflect what algorithms would look like both on CPU and GPU. I'll be on vacation for two weeks; I'll try to get back to this after I'm back.

[cc]should bias this by one to remove zero being an invalid value

[cd]I might be missing something, but it looks like this only supports linear gradients. Is that right? What about various tile modes? Is this meant only to support clamping?

Other gradients people might expect to see: radial, sweep, two point radial, mesh. Other tile modes: mirror, decal, or repeated.

[ce]I see, this gets expanded in paint.

It might be good to just rename this to a Stops/Colors block or something. It won't really capture all the information about a gradient.

[cf]yeah, a better name would be good.

[cg]"Color Stops Block" maybe

[ch]We could use a bit in the stop to mark that the color was a color32 value instead.

[ci]For example, the sign bit, since stops are always positive and the NaN/non-NaN distinction doesn't care about the stop bit.

Or the lowest bit, which would not interfere with the 0xFFD0/0xFFE0 stuff, leaving it consistent with the rest of the format.

[cj]lowest bit might be good since implementations could just ignore the bit when dealing with the stop

[ck]We can probably reduce that to 8 words per paint pretty safely

[cl]Conical gradient: focal distance, focal radius, list of uint32 colors, list of float32 stops, tile mode, matrix4 (implicitly unit gradient at origin with secondary focal point on x axis)

Sweep gradient: list of uint32 colors, list of float32 stops, tile mode, matrix4 (implicitly unit gradient at origin, 0.0 and 1.0 stops at positive x axis, use matrix and stops to adjust)

[cm]Mesh gradient is an SVG 2 thing - I'm under the impression it won't play nicely with this structure, unless it treats the stops/colors specially as applying to specific mesh patches.  See https://svgwg.org/svg-next/pservers.html#MeshGradients

[cn]And I'm missing where tile mode is allowed for a Linear or Radial gradient - I would have thought flags but it looks like not?

[co]the two least significant bits of FLAGS is the tile mode, yes (same mapping as dart:ui TileMode, actually) The other bits of FLAGS are currently unused.

 

I haven't looked at Mesh gradients yet (from the SVG2 spec it's not clear to me exactly what you need to define a mesh gradient?) but in general this format is wide open for expansion. For example, we could add a new block type that describes mesh gradients, and then have a paint block type that just points to one of those.

[cp]these sections should be elaborated

[cq]should add logic to track if a paint code is static

[cr]should add logic here to mention if a composition is static

[cs]I might be missing a subtletly here (it's hard to metnally turn the stepwise algorithms into high-level semantics)/ But on the face of it this seems to make it impossible to do arbitrarily nested group opacity, since a compositing block can only reference shape blocks and not other compositing blocks. Is this possible in some less obvious way?

[ct]No, you are correct. There's no group opacity (or group anything) in this format as designed. See the "Group transforms, group opacity, effect layers, and clips" section above. In principle the encoder could work around this in many cases, though whether it could always do so without leaving ugly seams is a less comfortable question.

I'm curious about your thoughts on this. We could certainly expand the format to include group opacity (and other group effects), at the cost of a worse worst-case performance.

[cu]_Marked as resolved_

[cv]_Re-opened_

[cw]Group opacity (and group transforms) are supported by most vector graphics apps and formats. Not having them would significantly limit the utility.

[cx]we could remove the operator and steal a bit from the SEQUENCE_LENGTH to define whether OPERATOR is color32 or a reference. That would let us collapse the shape to 4 words, 16 per block, and we'd use the paint to store additional information if necessary.

[cy]This design makes it difficult to reuse shapes in distinct compositions (maybe the intent is to clone since the format is clearly not design for space efficiency).

[cz]Yeah, if you want to reuse different subsets of shapes to make different compositions, this doesn't work well. (If you want the exact same set but with different matrices, you can reuse the list of shapes with a different list of matrices).

This is one of the parts of the format I'm least happy with. It's a compromise intended to make it easier to stream bunches of shapes and matrices at once into a shader program, but the cost may be too high (and I haven't yet proved that it actually works in the shader situation).

Would you vote more for an explicit list of shapes somehow, e.g. a block that lists shape IDs? composition block -> list of shapes block -> shapes block -> curve blocks?

[da]I don't know enough about shaders to know what they could process efficiently, but that sounds like it would make shape reuse more practical.

[db]future versions could have a word that controls if a composition is to be included or not

[dc](or this could be in paint)

[dd]we could add other blend modes in the paint

[de]i should talk about how the format can be extended to support 48 bits per channel color (by having a color space block)

[df]I should move paint references to the 0x7F range since those aren't useful in expressions.

[dg]Yes - and although it's just an example ... it's likely that the pixel shader version is at least as fast, if not faster, than the triangulated circle, due to over-shading.

[dh]Can you talk more about "over-shading"? I tried to Google the term but had little luck.

[di]Sure. (I tried to find a good reference, but most things were pretty terse). The GPU always ends up rasterizing things in 2x2 "quads" of pixels. This is true for desktop and mobile, from every vendor. Even though no API mandates it, it's functionally impossible to build a working GPU without doing this. Running the shader in lock-step on all four pixels in a quad is how the GPU computes derivatives of values in the shader - particularly derivatives of UVs, which determine if textures are being minified or magnified, and which mipmap to use, etc. When a triangle touches any portion of a particular quad, the entire quad needs to be shaded (so those derivatives can be calculated via differencing). For dense meshes with lots of edges, the diagonals tend to only "use" about half of the pixel work being in each quad. Then, some other triangle touches the other side of that same diagonal, and may end up re-shading the *same* four pixels, again discarding about half of the work.

If the pixel shader is simple enough (so that it isn't the bottleneck), this may not matter. But the effect is large enough to be measurable - for expensive full-screen shaders (blurs, tone-mapping, etc...), it's faster to draw a single giant triangle that extends beyond the screen than a screen-sized rectangle made of two triangles, just because every pixel along the diagonal is shaded twice.

[dj]Wow, fascinating. I had no idea. Thanks!

[dk]Casually dropping into this discussion years later.

One advantage of tessellation for circle instead of running a faster specialized vertex shader is that now you can easily combine this geometry with other parts of your picture. For example, to draw a square, circle, triangle you could either execute three different shaders or transform the geometry into a standardized form and execute a single shader. As far as I know ( and I don't know much, to be fair), these sorts of state transitions are still relatively expensive.

Of course, if the different shapes have dramatically different color sources, such as three different kinds of gradients, then its less clear to me whether it would still make sense to combine them, or if you'd end up falling back to using three different fragment shaders anyway.