Number Formatting Skeleton Syntax

Note: this document is visible to the public

Executive Summary

This document proposes a new syntax for specifying number formatting behavior by means of a locale-independent text string.  The syntax will be able to be used in ICU MessageFormat strings and power most of the functionality of NumberFormatter.

Background

ICU DecimalFormat instances are backed by a pattern, as defined in Unicode Standard TR 35 (LDML).  The syntax for DecimalFormat patterns is based on a syntax used by early versions of Microsoft Excel.  Examples of patterns:

#,##0.###

¤#,##0.00

#,##,@@#%

00E+0**

Unfortunately, DecimalFormat patterns have downsides, which has made them something that we expressly discourage unless users know what they are doing:

  1. They mix locale-sensitive information, such as the grouping size, with declarative formatting information.
  2. They cover only a subset of the features available in DecimalFormat.

Ticket #8610 suggested the idea of adding "number formatting skeletons" to fix some of these issues.  In conjunction with the new NumberFormatter API that was introduced in ICU 60, I am designing a syntax for these number formatting skeletons.  A skeleton could be used to initialize a NumberFormatter similar to how patterns can be applied to DecimalFormat.  Example:

// Old API

DecimalFormat df = //…

df.applyPattern("<pattern>");

// New API

LocalizedNumberFormatter lnf = NumberFormatter.fromSkeleton("<skeleton>");

Perhaps the biggest single use case for number formatting skeleton strings is in MessageFormat (see ticket #12552).  Currently, users of MessageFormat are limited to using either high-level style arguments or pattern strings:

{name/number, type, style} →

{0,number,percent}

{0,number,integer}

{0,number,<pattern>}

A skeleton string could be introduced in a backwards-compatible way by using a special prefix string that is unlikely to occur in an Excel-style pattern.  The proposal for this "skeleton signal" is two colons ':' as shown below:

{0,number,::<number-skeleton>}

We will ensure that the same syntax works for date skeletons, also:

{0,date,::<date-skeleton>}

If a user wants the literal '::' in their old-style message pattern, they can enclose it in quotation marks, like this example:

{0,number,'::'0.00}

Goals and Non-Goals

Goals

Non-Goals

Proposed Syntax

The skeleton consists of a space-separated list of parameters.  Spaces are considered valid if they are in [:Pattern_White_Space:].  Parameters are composed of a stem followed by one or more options, separated from the stem and from one another by a forward-slash (U+002F).  Stems and options may not contain [:Pattern_White_Space:] or U+002F.

Stems are often word-based, but they can also be an arbitrary blueprint with custom syntax.  For example, significant digit rounding has a blueprint starting with "@", and fraction rounding has a blueprint starting with ".".  Further, all possible stems are guaranteed to be unique and unambiguous.  The number of possible stems may grow over time, and when a new stem or class of stems is added, it must be proven to be unique from all possible pre-existing stems.  The proof of uniqueness for word-based stems is usually trivial; the proof of uniqueness for blueprint-based stems can be, but does not necessarily need to be, derived from the uniqueness of its leading character or sequence of characters.

Each parameter corresponds to one of the macro-settings in the NumberFormatter API.  It is an error for a skeleton string to contain more than one value for a particular macro-setting.

If a user attempts to parse a string that is not a well-formed number skeleton, or if a user attempts to generate one from a NumberFormatter containing options not supported by skeleton string syntax, an exception is thrown (Java) or an error code is set (C++).

Skeletons have a normalized form: the options are in a deterministic order, and default values are not included.  However, skeletons need not be in the normalized form to be considered valid.  The order of options is defined to be:

  1. Notation
  2. Unit
  3. Per-Unit
  4. Rounding
  5. Grouping
  6. Integer Width
  7. Symbols (numbering system)
  8. Unit Width
  9. Sign Display
  10. Decimal Separator Display
  11. Multiplier

The relative order of options will not change in the future, but new options may be inserted in the middle or added at the end.

All parameters are case-sensitive.

Examples

round-integer  ⇒ round to the nearest integer

round-unlimited  ⇒ display all digits without rounding

percent  ⇒ display with percent sign

percent round-integer  ⇒ display with percent sign and round to nearest integer

percent sign-always  ⇒ display with percent and show plus sign on positive numbers

group-never .##  ⇒ hide grouping and round to the hundredth

scientific @@#  ⇒ scientific notation with 2-3 significant figures

round-integer .##  ⇒ ERROR: conflicting values for the setting "rounding"

Details

Here is a list of all possible macro-settings and their corresponding stems.

Proposed API

ICU4J

public final class NumberFormatter {

    /**

     * Call this method at the beginning of a NumberFormatter fluent chain to create an instance based

     * on a given number skeleton string.

     *

     * @param skeleton

     *            The skeleton string off of which to base this NumberFormatter.

     * @return An {@link UnlocalizedNumberFormatter}, to be used for chaining.

     * @throws SkeletonSyntaxException If the given string is not a valid number formatting skeleton.

     * @draft ICU 62

     * @provisional This API might change or be removed in a future release.

     */

    public static UnlocalizedNumberFormatter forSkeleton(String skeleton) {

        return NumberSkeletonImpl.getOrCreate(skeleton);

    }

}

public abstract class NumberFormatterSettings<T extends NumberFormatterSettings<?>> {

    /**

     * Creates a skeleton string representation of this number formatter. A skeleton string is a

     * locale-agnostic serialized form of a number formatter.

     * <p>

     * Not all options are capable of being represented in the skeleton string; for example, a

     * DecimalFormatSymbols object. If any such option is encountered, an

     * {@link UnsupportedOperationException} is thrown.

     * <p>

     * The returned skeleton is in normalized form, such that two number formatters with equivalent

     * behavior should produce the same skeleton.

     *

     * @return A number skeleton string with behavior corresponding to this number formatter.

     * @throws UnsupportedOperationException

     *             If the number formatter has an option that cannot be represented in a skeleton string.

     * @draft ICU 62

     * @provisional This API might change or be removed in a future release.

     */

    public String toSkeleton() {

        return NumberSkeletonImpl.generate(resolve());

    }

}

/**

 * Exception used for illegal number skeleton strings.

 */

public class SkeletonSyntaxException extends IllegalArgumentException {

    private static final long serialVersionUID = 7733971331648360554L;

    public SkeletonSyntaxException(String message, CharSequence token) {

        super("Syntax error in skeleton string: " + message + ": " + token);

    }

    public SkeletonSyntaxException(String message, CharSequence token, Throwable cause) {

        super("Syntax error in skeleton string: " + message + ": " + token, cause);

    }

}

ICU4C

See numberformatter.h and unumberformatter.h

FAQ

How will the syntax be documented?

A document with examples will be added to the ICU User Guide, and the syntax itself may be proposed to the UTS 35 specification.  In the meantime, this document can serve as the reference.

Explain the choices behind the syntax for fraction-significant rounding?

The proposed syntax is .00/@## for exactly 2 fraction digits, but round to 3 significant digits ⇒ Rounder.fixedFraction(2).withMaxDigits(3).

Markus said, can we make it [sig digits][fractions] as in @##.00 ?

The reasons for going with the proposed syntax include:

  1. The sig-digit modifier for fraction rounding has different semantics and behavior than the standalone sig-digit setting. For example, trailing zeros are not affected when using the option, but they are when using the standalone setting.
  2. Fraction rounding is the primary strategy, and the sig-digit option should be seen as a customization on fraction rounding, not vice-versa.
  3. The @##.00 syntax seems to imply that it you are specifying behavior on something that comes before the decimal separator, but this is not the case; the option can also (and frequently does) affect digits after the decimal separator.
  4. Putting it as an option on the fraction stem is more consistent with the API.