New Number Formatting API for ICU 60

Introduction        2

Methodology        2

Compatibility and Current Users        2

Settings/Options        2

Notation        3

Unit        3

Rounding        3

Symbols        3

Sign Display        3

Decimal Display        3

Unit Width        3

Grouping        3

API Details        4

Fluent Design Pattern        4

Starter Examples        4

Comparison with DecimalFormat        5

More Details on the Terminal Sequence        5

Exhaustive Method Summary: Java        5

Primary "NumberFormatter" API        5

Enum Definitions        6

Notation Methods Signatures        7

Rounding Methods Signatures        7

Grouping Methods Signatures        7

IntegerWidth Signatures        8

NoUnit class for Percent/Permille MeasureUnits        8

NumberingSystem.LATIN Convenience Field        8

DecimalFormat to NumberFormatter Conversion Method        8

Exhaustive Method Summary: C++        8

Primary "NumberFormatter" API        8

Enum Definitions        9

Notation Methods Signatures        9

Rounding Methods Signatures        10

Grouping Methods Signatures        10

IntegerWidth Signatures        10

NoUnit class for Percent/Permille MeasureUnits        10

NumberingSystem.LATIN Convenience Field        11

DecimalFormat to NumberFormatter Conversion Method        11

Discussion        11

FAQ        11

What about C?        11

Why not have a mutable NumberFormatterBuilder with a .build() terminal method?        11

Why not have separate top-level methods for rounding, like "withMaximumFractionDigits()", "withSignificantDigits()", … ?        11

Why did you make Percent into a MeasureUnit?  Why not keep it as its own "style"?        11

What is a Skeleton and a Pattern?        12

Is there anything missing from the old API, compared to the new one?        12

Why did you name the numbering system method symbols() instead of numberingSystem()?        12

Can you explain the nested settings on the notation() method?        12

In Java, creating an immutable at every step in the fluent chain is expensive, right?        12

Why no "int" overload in the terminal method?        13

Why is RoundingMode merged into the rounding setter, but UnitWidth is its own top-level setting?        13

Why do you have to call .with() every time?        13

Why does locale have its own setter that changes the fluent method type?        13

On IntegerWidth, why the weird names for the setters?        14

Why did you rename Significant Digits to just Digits?        14

Can you elaborate on the new compact notation rounding strategy?        14

Terminal Sequence Method Naming        14

Why is there no Padding?        15

Error Codes in C++        15

Why did you make a new enum UnitWidth instead of re-using FormatWidth?        15

How about Parsing?        16

Why the new package name?        16

Default Rounding Strategy for Simple Notation        16

Default Rounding Strategy for Scientific Notation        17

Behavior for Doubles in Full-Precision Rounding Strategy        17

Percent/Permille Scaling        17

Grouping Setters        18

Error Getters in C++        18

Open Questions        18


Introduction

The merits of the effort to release a new API for number formatting have been discussed, and we concluded that moving forward with the design of a new API for DecimalFormat was the best decision for the future of ICU.  After surveying hundreds of call sites from internal code at Google, studying similar libraries, and spending countless hours brainstorming both the possible design patterns and feature sets, this document contains what number formatting can look like in ICU 60 and beyond.

This document contains four parts:

  1. A description of the methodology that has gone into the design.
  2. An overview of the three primary settings and five supplemental settings that I have boiled things down to.
  3. Details of the API specification with example code.
  4. A discussion section, including FAQ and open questions.

Feel free to leave comments, but make sure to read the Discussion section first in case your question has already been addressed.

Methodology

ICU DecimalFormat contains nearly 40 different settings, which I will call "microsettings".  Microsettings have the following problems:

  1. Non-Orthogonality: When multiple different settings touch the same piece of the number formatting pipeline, you end up with undefined behavior.  Rounding is a great example of this.  What happens when someone specifies minimumFractionDigits alongside maximumSignificantDigits and roundingIncrement?  How about when they set maximumFractionDigits alongside a custom currencyUsage?  These microsettings are correlated; in order for the API to fully define its own behavior, it needs to specify what happens in all possible combinations.  There are at least 7 different microsettings that affect rounding, and with 2^7=128 different possible combinations, we get an exponential explosion of edge cases.
  2. API Clutter: The sheer quantity of methods on DecimalFormat makes it hard to find what you need.
  3. Locale Data: Since locale data is loaded in the constructor of DecimalFormat, all settings mutate data that has already been localized.  This design choice facilitates users overriding locale data and specifying how to format numbers, instead of specifying what they need.

I have attempted to whittle number formatting down to the core "macrosettings" that people actually use.  These macrosettings are as nearly 100% orthogonal as possible, and in cases where they aren't orthogonal, there are clear ways that we can specify the edge case behaviors.  The quantity of macrosettings is much smaller than microsettings: there are just three primary settings that cover the vast majority of users, plus five more to facilitate more specific needs.  Finally, the settings are specified before locale data is loaded, meaning that we take what the user wants and figure out on our own how to honor their request.

Compatibility and Current Users

This API is designed for new users of ICU.  The existing DecimalFormat API is not going away since it provides compatibility with the JDK.  In addition, since DecimalFormat will be converted to a wrapper over NumberFormatter, existing users of DecimalFormat will enjoy bug fixes and performance improvements right away after upgrading to ICU 60.  Migrating your existing code is not necessary.

Although DecimalFormat will continue to be @stable and get first-class support, in the future, new features may be added to NumberFormatter without necessarily adding equivalent new features to DecimalFormat.


Settings/Options

There are three primary macrosettings: Notation, Unit, and Rounding.  Together, these cover 12 microsettings, several other settings currently housed in different classes (CompactDecimalFormat and MeasureFormat), and the large majority of current call sites.  The following five macrosettings cover 11 more microsettings and bring us to nearly complete coverage of call sites.

"Default Value" means the value the setting takes on if not explicitly specified.  "Other Options" illustrate the choices the user can make on that setting.  "Additional Customizations" are tweaks specific to the "other options."

Note that locale is covered in its own part of the API and isn't considered to be a macrosetting.

Free illustration: Knob, Switch, Black Textured - Free Image on ...

Notation

Default Value: Simple

Other Options: Scientific, Compact

Additional Customizations: Engineering notation, Exponent sign display, Min exponent digits, Compact display width

Future: option for standard notation with scientific/compact for numbers greater than X

Free illustration: Knob, Switch, Black Textured - Free Image on ...

Unit

Default Value: No unit

Other Options: Percent, Currency, CLDR units

Additional Customizations: Display width, via "unit display" setting

Unit can be specified at the top level for single values only (not lists or ranges)

Free illustration: Knob, Switch, Black Textured - Free Image on ...

Rounding

Default Value: See Open Questions*

Other Options: Fraction length, Significant digits, Currency rules (including standard style and cash style), Increment, Fraction+Significant.

Additional Customizations: RoundingMode

* Default rounding is "currency rules" if Unit is a Currency and Notation is not Compact

* Default rounding is "fraction-significant" if Notation is Compact

Free illustration: Knob, Switch, Black Textured - Free Image on ...

Symbols

Default Value: Use Locale Data

Other Options: Custom instance of DecimalFormatSymbols or NumberingSystem, which can be used for specifying ASCII digits.

Free illustration: Knob, Switch, Black Textured - Free Image on ...

Sign Display

Default Value: Automatic 1/-3

Other Options: Always +1/-3, Never 1/3, Accounting 1/(3), Accounting-Always +1/(3)

Free illustration: Knob, Switch, Black Textured - Free Image on ...

Integer Width

Default Value: Pad to 1 zero (minInt=1)

Other Options: Change minInt or maxInt

Free illustration: Knob, Switch, Black Textured - Free Image on ...

Decimal Display

Default Value: Automatic (0.9, 1, 1.1)

Other Options: Always (0.9, 1., 1.1)

Free illustration: Knob, Switch, Black Textured - Free Image on ...

Unit Width

Default Value: Narrow ($10.00)

Other Options: Short (CA$10.00), Wide (10 Canadian dollars), Hidden (10.00), and others in the UnitWidth enum

Free illustration: Knob, Switch, Black Textured - Free Image on ...

Grouping

Default Value: Use Locale Data

Other Options: No Grouping, Custom

Currently in technical preview; see FAQ


API Details

Fluent Design Pattern

The "fluent" pattern, characterized by the chaining of setters, gained popularity in Java and has been spreading in many different forms to other programming languages.  The flavor of API proposed for ICU number formatting has the following features:

  1. Terminal Function: All resolution of locale data takes place in the so-called terminal function: the one that performs business logic as opposed to returning another layer of the fluent chain.
  2. Immutability: Every setter in the chain returns an immutable object.  This concept is inspired by the Guava libraries.  Since all user-facing objects are immutable, the user need not concern themselves with thread safety issues.  See the FAQ for discussion on methods to optimize the performance of an immutable fluent chain in Java.
  3. Multiple Layers: In order to enforce optimal orthogonality, most of the macrosettings have additional knobs that can be tweaked within themselves, without polluting the top-level fluent namespace.

The fluent pattern works in both Java and C++.  In Java, a new lightweight, immutable object is returned each time; for more information, see the FAQ.  In C++, an object is returned by value upon each call in the fluent chain.  We focus on keeping the value objects small with fast copy constructors.  We rely on RVO (return value optimization), and we can also rely on C++11 move assignment to some degree.  As of the writing of this document, sizeof(UnlocalizedNumberFormatter) is 376 bytes

At the end of the fluent chain, either before or after supplying the locale, you end up with an object that you can keep around long-term. The locale wants to be at the end so that server-side code can keep one unlocalized version for a specific call site, and specify the locale according to the user that's being served at the moment.

Starter Examples

Although these are written in Java, they will work in a similar way in C++ and in any C++ wrapper languages.  For the purposes of illustration only, the locale in these examples is assumed to be ULocale.ENGLISH.  Keep in mind that the intermediate return values are all immutable and can be safely stored in a singleton if desired.

// No setters in the fluent chain

NumberFormatter.with().locale(loc).format(1234).toString();  // 1,234

// Examples with the three primary setters
NumberFormatter.with().notation(Notation.SCIENTIFIC).locale(loc).format(1234).toString();  // 1.2E3

NumberFormatter.with().unit(NoUnit.PERCENT).locale(loc).format(12.34).toString();  // 12.34%

NumberFormatter.with().rounding(Rounding.fixedFraction(2)).locale(loc).format(5).toString(); // 5.00

NumberFormatter.with()
   .notation(Notation.COMPACT_LONG)
   .unit(MeasureUnit.METER)

    .unitWidth(UnitWidth.FULL_NAME)
   .rounding(Rounding.maxDigits(3).withMode(RoundingMode.CEILING))

    .locale(loc)
   .format(123400)

    .toString();  // 124 thousand meters

// Examples with the five supplemental setters
NumberFormatter.with().grouping(Grouping.NO_GROUPING).locale(loc).format(123400).toString();  // 123400

NumberFormatter.with().symbols(DecimalFormatSymbols.getInstance("fr")).locale(loc).format(1.2).toString(); // 1,2

NumberFormatter.with().sign(SignDisplay.ALWAYS).locale(loc).format(1234).toString();  // +1,234

NumberFormatter.with().integerWidth(IntegerWidth.zeroFillTo(0)).locale(loc).format(0.05).toString(); // .05

NumberFormatter.with().decimal(DecimalMarkDisplay.ALWAYS_SHOWN).locale(loc).format(23).toString(); // 23.

UnlocalizedNumberFormatter unf = NumberFormatter.with()

    .symbols(NumberingSystem.getInstanceByName("mymr"))

    .sign(SignDisplay.NEVER)

    .integerWidth(IntegerWidth.zeroFillTo(6))

    .decimal(DecimalMarkDisplay.AUTO);

LocalizedNumberFormatter lnf = unf.locale(ULocale.FRENCH);

lnf.format(-98765);  // ၀၉၈ ၇၆၅

unf.locale(ULocale.ENGLISH).format(-98765);  // ၀၉၈,၇၆၅

// ASCII Digits:

NumberFormatter.with().symbols(NumberingSystem.LATIN).locale(loc).format(540).toString(); // 540

// Append to an Appendable (StringBuilder, FileWriter, etc):

Appendable a = new StringBuilder();

NumberFormatter.with().locale(loc).format(54321).appendTo(a);

// Specify locale at the beginning of the chain instead of the end (useful in applications such as Android):

NumberFormatter.withLocale(loc).unit(MeasureUnit.METER).format(123).toString(); // 123 m

Comparison with DecimalFormat

Current users of DecimalFormat do not need to change how they do anything, since DecimalFormat will continue to have first-class support. However, for comparison, here is an example of how you would do something in NumberFormatter based on what you currently know as DecimalFormat.

// Format a number fixed at two decimal places:

DecimalFormat df = (DecimalFormat) NumberFormat.getInstance(loc);

df.setMinimumFractionDigits(2);

df.setMaximumFractionDigits(2);

df.format(1234.5);  // 1,234.50

NumberFormatter.with()

    .rounding(Rounder.fixedFraction(2))

    .locale(loc)

    .format(1234)

    .toString();  // 1,234.50

// Convert the DecimalFormat to a NumberFormatter:

LocalizedNumberFormatter converted = df.toNumberFormatter();

converted.format(1234).toString();  // 1,234.50

More Details on the Terminal Sequence

The terminal sequence is an important piece of the puzzle.  In the examples above, the terminal sequence is composed of two chaining methods followed by a final terminal methods: .locale(loc).format(number).toString().  Naturally, there are additional terminal sequences supporting more input and return types.

  1. Appendable: Appends the result to an Appendable (e.g., StringBuilder) and returns the Appendable subclass for chaining.
  2. CurrencyAmount/Measure: Takes a number with an associated unit.  Behavior is equivalent to calling .withUnit(...) with the specified unit, but it can be more efficient if the identity of the unit is going to be changing frequently.  The unit specified in the terminal method will override any unit that may have been specified in the fluent chain.
  3. AttributedCharacterIterator/FieldPositionIterator: To view the types of fields in the output string.
  4. BigDecimal: Get the fully processed and rounded result; can be subsequently used as input to PluralRules.
  5. List or Range (future): Takes a list or range of numbers, possibly with different units, and formats them appropriately.  For example, a list with {Measure(1,FEET),Measure(5,INCH)} would produce "1 foot, 5 inches", and a range with {4,6} would produce "4-6".  The same rounding strategies, notation, and other top-level settings would apply the same to every number in the output.

I hope that at this point you are thinking, "wait, how do you reconcile Create&Destroy users with Long-Lived users if you have only a single API and terminal method?"  The answer is this.  The first time the terminal method is called, a "Create&Destroy"-optimized path is taken; let's say it takes 200ns.  However, after some heuristic number of calls to the terminal method, let's say 5, a data structure is computed to speed up the formatting process; the data structure might take a nontrivial number of cycles to compute, but from there on out, calls to the terminal method can take 100ns or less.  The API is therefore self-regulating.  (Thanks to Louis Wasserman for this tip!)

Note that with the "self-regulating" terminal method, although the formatter is externally immutable, it is mutable internally; it can switch itself from the slow path (optimized for Create&Destroy) to the faster path.  Under the hood, there is an AtomicInteger.  When a thread increments the atomic integer to the heuristic cutoff value (an internal setting subject to change, currently 3), it will go and compute the fast path data structure without blocking other threads, who can continue using the slow path in the meantime.  When the fast path data structure is ready, it can be swapped into a volatile field (Java), from which point threads will start using it instead of the slow path.

Exhaustive Method Summary: Java

All classes live in a new package named com.ibm.icu.number (see FAQ).

All classes have private or package-private constructors to prevent subclassing.

Primary "NumberFormatter" API

public final class NumberFormatter {

  // Entry points

  public static UnlocalizedNumberFormatter with()

  // Shortcut entry point for locale: equivalent to .with().locale(locale)

  public static LocalizedNumberFormatter withLocale(Locale locale)

  public static LocalizedNumberFormatter withLocale(ULocale locale)

  // Note: This is all of the static and non-static methods on NumberFormatter.

  // Everything else is implemented on specialized types.

}

public abstract class NumberFormatterSettings<T extends NumberFormatterSettings<?>> {

  // Primary Settings

  public T notation(Notation notation);

  public T unit(MeasureUnit unit);

  public T rounding(IRounding rounder);

  // Supplemental Settings

  public T grouping(IGrouping grouping);

  public T integerWidth(IntegerWidth style);

  public T symbols(DecimalFormatSymbols symbols);

  public T symbols(NumberingSystem ns);  // See FAQ about method naming here

  public T unitWidth(UnitWidth style);

  public T sign(SignDisplay style);

  public T decimal(DecimalMarkDisplay style);

}

public class UnlocalizedNumberFormatter extends NumberFormatterSettings<UnlocalizedNumberFormatter> {

  // NOTE: All methods from NumberFormatterSettings are available here.

  // Locale Methods to move to the next phase

  public LocalizedNumberFormatter locale(Locale locale);

  public LocalizedNumberFormatter locale(ULocale locale);

}

public class LocalizedNumberFormatter extends NumberFormatterSettings<LocalizedNumberFormatter> {

  // NOTE: All methods from NumberFormatterSettings are available here.

  public FormattedNumber format(long input);  // Also covers int, short, byte

  public FormattedNumber format(double input);  // Also covers float

  public FormattedNumber format(Measure input);  // Also covers CurrencyAmount

  public FormattedNumber format(Number input);  // Includes BigDecimal and BigInteger

}

public class FormattedNumber {

  public String toString();

  public <A extends Appendable> A appendTo(A appendable); // throws ICUUncheckedIOException instead of IOException

  public void populateFieldPosition(FieldPosition fieldPosition);

  public AttributedCharacterIterator getAttributes();

  public BigDecimal toBigDecimal();

}

Enum Definitions

Note: These all live inside the NumberFormatter class: NumberFormatter.SignDisplay.AUTO

  public static enum DecimalMarkDisplay {

    AUTO,

    ALWAYS,

  }

  public static enum SignDisplay {

    AUTO,

    ALWAYS,

    NEVER,

    ACCOUNTING, // ($123.00) for negative; undefined behavior for non-currency units

    ACCOUNTING_ALWAYS // +$123.00 for positive and ($123.00) for negative; undefined behavior for non-currency units

  }

    public static enum UnitWidth {

        NARROW, // ¤¤¤¤¤ or narrow measure unit

        SHORT, // ¤ or short measure unit (DEFAULT)

        ISO_CODE, // ¤¤; undefined for measure unit

        FULL_NAME, // ¤¤¤ or wide unit

        HIDDEN, // no unit is displayed, but other unit effects are obeyed (like currency rounding)

    }

Notation Methods Signatures

public class Notation {

  // Entrypoints

  public static ScientificNotation scientific();

  public static ScientificNotation engineering();

  public static CompactNotation compactShort();

  public static CompactNotation compactLong();

  public static SimpleNotation simple();

}

public class ScientificNotation extends Notation {

  // "Nested" fluent chain for scientific notation

  public NotationScientific withMinExponentDigits(int minExponentDigits);

  public NotationScientific withExponentSignDisplay(SignDisplay exponentSignDisplay);

}

public class CompactNotation extends Notation {

}

public class SimpleNotation extends Notation {

}

Rounding Methods Signatures

public class Rounder {

  // Convenience:

  public static Rounder unlimited();

  public static FractionRounder integer();  // minFrac == maxFrac == 0

  // Fraction strategies:

  public static FractionRounder fixedFraction(int minMaxFrac)  // for minFrac == maxFrac

  public static FractionRounder minFraction(int minFrac)

  public static FractionRounder maxFraction(int maxFrac)

  public static FractionRounder minMaxFraction(int minFrac, int maxFrac)

  // Significant digits strategies (see FAQ about method naming):

  public static Rounder fixedDigits(int minMaxSig)  // for minSig == maxSig

  public static Rounder minDigits(int minSig)

  public static Rounder maxDigits(int maxSig)

  public static Rounder minMaxDigits(int minSig, int maxSig)

  // Other strategies:

  public static Rounder increment(BigDecimal roundingIncrement)

  public static CurrencyRounder currencyStyle(CurrencyUsage currencyUsage)

  // Non-static RoundingMode fluent setter for all strategies:

  public Rounder withMode(RoundingMode roundingMode)

}

public class FractionRounder extends Rounder {

  // Non-static fluent setters to enable fraction-significant rounding (a la SignificantDigitsMode)

  public Rounding withMinDigits(int minDigits)

  public Rounding withMaxDigits(int maxDigits)

}

public class CurrencyRounder extends Rounder {

  // Non-static fluent setter to specify the currency for CurrencyUsage

  // This is optional and can be used if the user wants to specify a currency here but not in the main API

  public Rounder withCurrency(Currency currency)

}

Grouping Methods Signatures

public class Grouper {

  public static Grouper default();

  public static Grouper min2();

  public static Grouper none();

}

IntegerWidth Signatures

public final class IntegerWidth {

  public static final IntegerWidth DEFAULT

  public static IntegerWidth zeroFillTo(int minInt)  // i.e., setMinimumIntegerDigits(); see FAQ about naming

  public IntegerWidth truncateAt(int maxInt)  // i.e., setMaximumIntegerDigits(); see FAQ about naming

}

NoUnit class for Percent/Permille MeasureUnits

// Comes along with corresponding changes within the MeasureUnit class to add support for "none"

public class NoUnit extends MeasureUnit {

    public static final NoUnit BASE = (NoUnit) MeasureUnit.internalGetInstance("none", "base");

    public static final NoUnit PERCENT = (NoUnit) MeasureUnit.internalGetInstance("none", "percent");

    public static final NoUnit PERMILLE = (NoUnit) MeasureUnit.internalGetInstance("none", "permille");

}

NumberingSystem.LATIN Convenience Field

public class NumberingSystem { // existing class

    public static final NumberingSystem LATIN = lookupInstanceByName("latn");

}

DecimalFormat to NumberFormatter Conversion Method

public class DecimalFormat extends NumberFormat {

  public LocalizedNumberFormatter toNumberFormatter();

}

Exhaustive Method Summary: C++

Primary "NumberFormatter" API

class NumberFormatter final {

  public:

    static UnlocalizedNumberFormatter with();

    static LocalizedNumberFormatter withLocale(const Locale &locale);

  private:

    // Don't construct me!

    NumberFormatter() = delete;

};

template<typename Derived>

class NumberFormatterSettings {

  public:

    // Primary settings:

    Derived notation(const Notation &notation) const;

    Derived unit(const icu::MeasureUnit &unit) const;  // For CurrencyUnit and NoUnit

    Derived adoptUnit(const icu::MeasureUnit *unit) const;  // For all other MeasureUnits

    Derived rounding(const Rounder &rounder) const;

    // Supplemental settings:

    Derived grouping(const Grouper &grouper) const;

    Derived integerWidth(const IntegerWidth &style) const;

    Derived unitWidth(const UnitWidth &width) const;

    Derived sign(const SignDisplay &width) const;

    Derived decimal(const DecimalMarkDisplay &width) const;

    // Makes a copy of the DecimalFormatSymbols:

    Derived symbols(const DecimalFormatSymbols &symbols) const;

    // Takes ownership of the NumberingSystem:

    Derived adoptSymbols(const NumberingSystem *symbols) const;

    // Sets the UErrorCode if an error occurred in the fluent chain.

    // Preserves older error codes in the outErrorCode.

    // Returns TRUE if U_FAILURE(outErrorCode).

    UBool copyErrorTo(UErrorCode &outErrorCode) const;

};

class UnlocalizedNumberFormatter : public NumberFormatterSettings<UnlocalizedNumberFormatter> {

  public:

    LocalizedNumberFormatter locale(const icu::Locale &locale) const;

};

class LocalizedNumberFormatter : public NumberFormatterSettings<LocalizedNumberFormatter> {

  public:

    // Note: Overloads are not possible due to poor C++ type inference. An int has equal weight going to either int64_t or to double.

    FormattedNumber formatInt(int64_t value, UErrorCode &status) const;

    FormattedNumber formatDouble(double value, UErrorCode &status) const;

    FormattedNumber formatDecimal(StringPiece value, UErrorCode &status) const;

};

class FormattedNumber {

  public:

    UnicodeString toString() const;

    Appendable &appendTo(Appendable &appendable);

    void populateFieldPosition(FieldPosition &fieldPosition);

    void populateFieldPositionIterator(FieldPositionIterator &iterator);

};

Enum Definitions

enum SignDisplay {

    UNUM_SIGN_DISPLAY_AUTO,

    UNUM_SIGN_DISPLAY_ALWAYS,

    UNUM_SIGN_DISPLAY_NEVER,

    UNUM_SIGN_DISPLAY_ACCOUNTING,

    UNUM_SIGN_DISPLAY_ACCOUNTING_ALWAYS

};

enum DecimalMarkDisplay {

    UNUM_DECIMAL_MARK_DISPLAY_AUTO, UNUM_DECIMAL_MARK_DISPLAY_ALWAYS,

};

enum UnitWidth {

    UNUM_UNIT_WIDTH_NARROW, // ¤¤¤¤¤ or narrow measure unit

    UNUM_UNIT_WIDTH_SHORT, // ¤ or short measure unit (DEFAULT)

    UNUM_UNIT_WIDTH_ISO_CODE, // ¤¤; undefined for measure unit

    UNUM_UNIT_WIDTH_FULL_NAME, // ¤¤¤ or wide unit

    UNUM_UNIT_WIDTH_HIDDEN, // no unit is displayed, but other unit effects are obeyed (like currency rounding)

};

Notation Methods Signatures

// Reserve extra names in case they are added as classes in the future:

typedef Notation NotationCompact;

typedef Notation NotationSimple;

class Notation {

  public:

    static NotationScientific scientific();

    static NotationScientific engineering();

    static NotationCompact compactShort();

    static NotationCompact compactLong();

    static NotationSimple simple();

}

class NotationScientific : public Notation {

  public:

    NotationScientific withMinExponentDigits(int32_t minExponentDigits) const;

    NotationScientific withExponentSignDisplay(SignDisplay exponentSignDisplay) const;

};

Rounding Methods Signatures

// Reserve extra names in case they are added as classes in the future:

typedef Rounder FigureRounder;

class Rounder {

  public:

    static Rounder unlimited();

    static FractionRounder integer();

    static FractionRounder fixedFraction(int32_t minMaxFrac);

    static FractionRounder minFraction(int32_t minFrac);

    static FractionRounder maxFraction(int32_t maxFrac);

    static FractionRounder minMaxFraction(int32_t minFrac, int32_t maxFrac);

    static FigureRounder fixedDigits(int32_t minMaxSig);

    static FigureRounder minDigits(int32_t minSig);

    static FigureRounder maxDigits(int32_t maxSig);

    static FigureRounder minMaxDigits(int32_t minSig, int32_t maxSig);

    static IncrementRounder increment(double roundingIncrement);

    static CurrencyRounder currency(UCurrencyUsage currencyUsage);

    Rounder withMode(icu::DecimalFormat::ERoundingMode roundingMode) const;

}

class FractionRounder : public Rounder {

  public:

    Rounder withMinDigits(int32_t minDigits) const;

    Rounder withMaxDigits(int32_t maxDigits) const;

};

class CurrencyRounder : public Rounder {

  public:

    Rounder withCurrency(const UChar *currency) const;

};

class IncrementRounder : public Rounder {
 public:

    // The below method is needed to force a fixed number of decimal places: for example, one might want to

    // round to the nearest 0.5 but have values displayed as "0.00", "0.50", "1.00", "1.50", and so forth (minFrac=2).

    // In Java, this functionality is accomplished by the scale of the BigDecimal rounding increment, but we

    // have a double rounding increment in C++ with no concept of scale.
   Rounder withMinFraction(int32_t minFrac) const;
};

Grouping Methods Signatures

class Grouper {

  public:

    static Grouper defaults();

    static Grouper minTwoDigits();

    static Grouper none();

};

IntegerWidth Signatures

class IntegerWidth {

  public:

    static IntegerWidth zeroFillTo(int32_t minInt);

    IntegerWidth truncateAt(int32_t maxInt);

};

NoUnit class for Percent/Permille MeasureUnits

class U_I18N_API NoUnit: public MeasureUnit {

public:

    static NoUnit U_EXPORT2 base();

    static NoUnit U_EXPORT2 percent();

    static NoUnit U_EXPORT2 permille();

    // Other boilerplate:

    NoUnit(const NoUnit& other);

    virtual UObject* clone() const;

    virtual UClassID getDynamicClassID() const;

    static UClassID U_EXPORT2 getStaticClassID();

    virtual ~NoUnit();

};

NumberingSystem.LATIN Convenience Field

Not in C++ since we don't have statically initialized fields.  The normal factory method can be used.

DecimalFormat to NumberFormatter Conversion Method

Not in C++ right now.  Will revisit in 61.


Discussion

FAQ

What about C?

The first release of this API will be object-oriented and available in Java and C++ only.  Since the Fluent pattern does not really work in C, further design will be required before rolling out something that feels natural in C.

Why not have a mutable NumberFormatterBuilder with a .build() terminal method?

Here are several reasons why:

  1. Most users think of String.format() or printf() when they want to format a number.  They don't want to have to "build" an object.  The "building" process should be done internally, which is what I am proposing with the self-regulating terminal method.
  2. Returning an immutable at each step is the epitome of thread safety.  The Builder pattern, although widely understood by seasoned developers, is not thread-safe if used incorrectly.
  3. Since we make no contracts about mutability, we retain full control over object lifecycle.  With immutability, we could do something like pre-build the most popular formatters at startup and then provide them for free at runtime.  The Builder pattern comes with contracts about object lifecycle, so we lose this flexibility on our side of the API boundary.
  4. .build() connotes doing something expensive.  Would we still self-regulate, or would we always build the full data structures when the user calls .build()?  If we self-regulate, then we oppose the connoted meaning of .build().  If we build the full data structures each time, then we hurt the "Create&Destroy" users who use the formatter once and then throw it away.
  5. The immutable fluent pattern, popularized by Guava, is perceived as more modern than the Builder pattern.

Upsides of the mutable Builder pattern would be:

  1. If the user has a lot of settings, there is less object thrashing with a mutable Builder.  (However, I have tried to design the API such that most users will not need to call more than 2 or 3 setters, and as discussed below, we can take steps to reduce the overhead of the immutable object chain.)
  2. Others?

With this in mind, I feel that the advantages of the immutable fluent pattern outweigh the downsides.

Why not have separate top-level methods for rounding, like "withMaximumFractionDigits()", "withSignificantDigits()", … ?

Because then we fall into the trap of non-orthogonality.  Perhaps the main reason why the current API has so many "bugs" and "edge cases" are because we don't define the behavior of what happens when you combine two different settings that affect the same part of the number formatting pipeline.  By keeping the top-level methods orthogonal, we reduce the number of edge cases and also make the API easier to digest for new users.

Why did you make Percent into a MeasureUnit?  Why not keep it as its own "style"?

The concept of "style" is vague and not well defined.  We had been considering scientific notation, compact notation, standard/iso/long currency, and percent/permille all as different "styles."  Within this concept of "style", there are really two things going on: how you want to display the number (scientific or compact) and some attribute that goes along with the number.  It turns out that the "some attribute" is nothing more than a unit: just as currency is a unit used to display monetary amounts, a percent is a unit used to display a dimensionless fraction or ratio.  You are free to combine the different notations with different units: for example, "$1.2K" (compact + currency) makes sense, as does "1.2E-3%" (scientific + percent).  On the other hand, it does not make sense to combine two different notations ("1.2E-3K"?) or two different units ("$1.2%"?).  Turning Percent into a MeasureUnit keeps the API clear, clean, and concise.

In C++, MeasureUnit and TimeUnit have factory methods that return a new instance.  However, CurrencyUnit operates via constructor (objects are stored by value).  My proposal makes C++ NoUnit have factory methods that return by value, most like the other fluent classes.

What is a Skeleton and a Pattern?

In ICU 61, we plan to propose a new syntax that fully covers the "macrosettings" in the new API, called a "skeleton".  The details of this syntax are yet to be made certain, but it is likely going to be (logically) a list of key-value pairs; something along the lines of "notation=compact; unit=miles; symbols=ns:latn".  Anything specified in the Skeleton will be overridden by their respective setters in the API.

A Pattern is the Excel-style syntax for decimal format patterns (the syntax used in the current ICU/JDK DecimalFormat).  This syntax is problematic for a number of reasons, but most significantly because it encodes locale data, something most users should be pulling from CLDR instead of specifying themselves.  Since users should generally not be using patterns, and since users interested in patterns can use DecimalFormat, they are not part of the API proposal for NumberFormatter.

Is there anything missing from the old API, compared to the new one?

Based on the survey of existing call sites, the new API covers virtually all legitimate use cases.  With this in mind, there are a few features that are not in the new API:

  1. Affixes (positive/negative prefix/suffix) → If absolutely required, you can use DecimalFormat.
  2. CurrencyPluralInfo → A mostly internal class that is technically part of DecimalFormat's public API.  It lets you override the locale's plural-dependent patterns for currency long name; see the test case TestCurrencyPluralInfoAndCustomPluralRules illustrating how to use the class.  I haven't found any existing users.  If you need to use the class, you can use it via DecimalFormat.
  3. Multiplier → Removing it from the API directly since it is trivial to do outside of ICU.
  4. Padding → Very few existing users (I found only one), and you can do padding outside of ICU.  See the Padding FAQ below.
  5. All of the parse-specific settings.  These will get a new home in a fluent parsing API, yet to be designed.

Additionally, since the old API is around to stay (for reasons discussed above), we should not feel bad about removing certain features that we may not have included were we to start over from scratch.

Why did you name the numbering system method symbols() instead of numberingSystem()?

Since DecimalFormatSymbols is itself effectively a NumberingSystem with bells and whistles, making the two methods have different names would introduce nonorthogonality.  What would it mean if you call both withSymbols() and withNumberingSystem()?  By naming the methods the same thing, we make it explicitly clear to the user that the two methods affect the same aspect of number formatting, without requiring them to read API docs.

Note that although changing the NumberingSystem normally affects the digits only, it can also affect symbols and patterns.

It was additionally suggested that we add a third overload that takes a numbering system by name in a string.  See #13354 for follow-up on that suggestion.

Can you explain the nested settings on the notation() method?

There are specialized properties that apply to only certain styles: for example, Scientific takes exponentSignDisplay, and Compact takes compactWidth.  There are a few ways we explored to deliver these specialized properties to the API:

  1. Put the specialized properties as their own methods on the top level.
  1. Bad: Top-level methods should apply universally, not only in special cases.
  1. Encode the specialized properties into an enum: SCIENTIFIC, SCIENTIFIC_SIGN_ALWAYS_SHOWN, ENGINEERING, ENGINEERING_SIGN_ALWAYS_SHOWN, COMPACT_SHORT, COMPACT_LONG.
  1. Bad: Causes an explosion of enum names, bad for API clutter and bad for future additions to the enum.
  2. Bad: Only supports simple, discrete settings; integer-based settings are not well supported.
  1. Use overloads that expose the specialized settings via compile-time type checking.  For example, “.withNotation(Scientific type, SignDisplay exponentSignDisplay)”, where Scientific is an enum used for exposing the second argument.
  1. Bad: Confusing to understand and contributes to API bloat since every setting and combination of settings needs its own top-level method.
  1. Do something like the Rounding interface: a “nested” fluent chain where settings are built upon each individual object type.

My proposal started with option 3, and after some revisions, I ended up going with option 4.

In Java, creating an immutable at every step in the fluent chain is expensive, right?

Not necessarily: it depends on the implementation.  In the proposed implementation, upon calling a setter, you create a lightweight object with a key, a value, and a pointer to the parent.  For example:

public class LocalizedNumberFormatter {

    private static enum KeyEnum { NOTATION, UNIT, ROUNDING, /* … */ };

    private LocalizedNumberFormatter parent;  // null for the root

    private KeyEnum key;

    private Object value;  // Note: all of the possible top-level values are Objects, so boxing is not necessary.

    // … implementation of NumberFormatter methods …

}

In other words, you build up a singly linked list of lightweight key/value pairs.  You then consume the linked list in the terminal method.  In a fluent chain with all eight setters, this implementation is about 43% faster than naively cloning the entire macroproperties object at each step.  For one extra performance improvement, the result of the backwards traversal of the linked list can be saved during the self-regulation phase of the terminal method.

In any case, the runtime overhead of the fluent chain is less than 10% of the total formatting runtime.  This is a small enough cost to justify the safer API in the long term.

Why no "int" overload in the terminal method?

In terms of performance, there is little to no advantage of having an int overload, because on our side of the API boundary, a simple if statement can be used to cast the long to an int if the value fits inside an int.  Seeing no compelling reason to add the extra API, we err on the side of reducing API bloat.

Why is RoundingMode merged into the rounding setter, but UnitWidth is its own top-level setting?

You can pass a Measure into the terminal method, and you might want to have a UnitWidth specified.  Furthermore, Unit is a complex setting because it requires different patterns and data depending on whether you are using a CLDR unit, a currency, or a percent; there are already other settings that are non-orthogonal to Unit.

Rounding is a simpler, logical interface that takes in an unrounded number and returns a rounded number.  RoundingMode is a piece of the puzzle, but only in the capacity to which it modifies the pre-built ICU rounders.  The pre-built ICU rounders can also be used standalone, so they need to know their RoundingMode.  Adding RoundingMode to the top level would break orthogonality and reduce the elegance of the rounding interface.

Why do you have to call .with() every time?

This proposal involves a static "entrypoint" method, .with(), and non-static fluent setters, .rounding(), .notation(), and so on.  At first we had considered naming the methods "withNotation()", "withRounding()", and putting them on both the static and non-static layers of the API – duplicating a significant set of methods.  We decided against this option because:

  1. Larger API documentation, resulting in "a real cost in term of mental effort to understand/read the API" (Joachim)
  2. Some increase in maintenance cost.

We also considered promoting one of the settings to be the static method, such as Locale, Rounding, or Notation. With that, every chain would start with something like "NumberFormatter.withNotation(...).withXxx(yyy).format(zzz)".  However, we decided that there was not a single setting that was natural at the top level; doing so would have just been a forced artificial change.

This is why we settled on just using a short entrypoint static method.

Why does locale have its own setter that changes the fluent method type?

The locale is the most important user-specified setting.  In effect, the idea is that all other settings are combined with locale data in the terminal method to produce a set of internal settings (similar to so-called "microsettings"), which are then combined with the number to produce your output string.  (The self-regulating terminal method would basically store the microsettings instead of recomputing them each time.)  Here is how I visualize it:

So essentially, we ended up with two classes, UnlocalizedNumberFormatter and LocalizedNumberFormatter, connected via a .locale() method.  In order to give users the ability to express their locale at the beginning of the chain, we also have the NumberFormatter.withLocale() static method, and you can set your locale at any point in the chain.  Here are several other options we considered:

  1. Expose .format() methods that take two arguments: the locale and the number.
  1. Small performance boost: there is one fewer call in everyone's fluent chain.
  2. Shorter call site: saves 5-10 characters.
  3. However, it is inconvenient for users who want to save a constant locale.
  4. Another minor drawback is that we couldn't support both JDK Local and ICU ULocale without contributing to combinatorial API explosion.
  1. Expose a .localize() method which builds the microsettings and exposes an API with format() methods but no more macrosetters.
  1. Susceptible to being misused since ".locale(locale).format(number)" would be faster for Create&Destroy users than ".localize(locale).format(number)".
  1. Expose a .locale() method, but, like 1, add format methods with the locale parameter until someone calls .locale(), at which point the locale parameter is removed from the format methods.
  1. Fixes problem (1a) above, since many fluent chains will no longer need the overhead of calling the extra fluent method.
  2. Contributes a bit extra API bloat, and can be confusing since there are two ways to achieve the same outcome, one of which is slightly more efficient than the other: ".locale(loc).format(123)" and ".format(loc, 123)".

If users want the default locale, they can specify explicitly with as .locale(Locale.getDefualt()).

On IntegerWidth, why the weird names for the setters?

The two settings used to be called "MinimumIntegerDigits" and "MaximumIntegerDigits".  However, it took effort to read the API docs to understand what the two methods actually mean.  This led to confusion, such as people opening tickets because they misunderstood what the methods were supposed to do.  I therefore put the effect of the method directly into the method name to increase clarity.

Additionally, the "truncateAt" method is intentionally non-static.  It can be called only as part of a miniature fluent chain.  For example, you can't do "IntegerWidth.truncateAt(2)", but you can do "IntegerWidth.zeroFillTo(1).truncateAt(2)".  The idea is that (1) many people who specify maxInt want to also specify minInt (perhaps to be the same as maxInt), (2) it reduces API bloat since there is only one method for minInt and one for maxInt, and (3) maxInt is an uncommon feature, so it doesn't need to have an overly prominent spot in the API.

Why did you rename Significant Digits to just Digits?

"Significant digits" is technical and wordy.  "Digits" is more pointed.  Saying "maximum significant digits" is clear to people who understand the jargon, but saying "maximum digits" gets the point across in fewer words.  We also considered "Figures", as in "Significant Figures", but went with "Digits" because the ICU-TC felt is was likely to be more consistent with users would expect.

Can you elaborate on the new compact notation rounding strategy?

In my survey, I found many places where users explicitly override the current rounding strategy of CompactDecimalFormat, which had defaulted to rounding to two maximum significant digits (Java) or three maximum significant digits (C++).  Anecdotally, I have spoken to users and perhaps the most common complaint about compact decimal format is that "123000" formats to "120K" instead of "123K".  My proposal therefore sets the rounding strategy on compact notation to the following rounding strategy:

  1. Scale the number; for example, divide by 10^3 or 10^6 as appropriate.
  2. Round to the closest integer OR to two significant digits, whichever results in MORE digits being shown.
  3. If the magnitude changes (e.g., 999 -> 1000), pick the affix according to the larger magnitude, reset, and return to step 1.
  4. Do not display any trailing zeros after the decimal point.

For example, here is how the following numbers will be rounded:

I expect this strategy to produce results that more users expect.

This change to compact notation rounding affects both the old and the new APIs.  When brought up in ICU-TC, there were no objections to changing the existing behavior.

Terminal Sequence Method Naming

The current proposal has the terminal sequence looking like this:

.locale(locale).format(number).toString();  // 42 chars

Although readable, it is long and verbose.  Here are some other options that we considered:

// shorten the method names:

.loc(locale).fmt(number).toString();  // 36 chars

.l(locale).f(number).toString();  // 32 chars

// a combined locale/number method that could supplement the expanded form:

.format(locale, number).toString();  // 35 chars

// a combined locale/number method that returns a String directly:

.format(locale, number);  // 24 chars

.toString(locale, number);  // 26 chars

// a new method parallel to format() that returns a String directly:

.locale(locale).print(number);  // 30 chars

.locale(locale).sprint(number);  // 31 chars

.locale(locale).render(number);  // 31 chars

.locale(locale).toString(number);  // 33 chars

These may be added to a future version of the API, but for the time being, we are erring on the side of caution and not bloating the API until we see what users actually want and need.

Why is there no Padding?

Padding, or adding extra characters (usually spaces) to your number in the string output to guarantee a certain width, is a feature in the old API.  We acknowledge that users might expect to have this feature since it is common in other number formatting libraries.  However, we did not include it in the first version of the API for the following reasons:

We plan to revisit the design of the Padding API for ICU 61.  To track this issue, see #13338.

Error Codes in C++

The advantage of adding them everywhere is that it is most easy to track where the error comes from in code.  The disadvantage is that it clutters call sites, especially since we expect users to call several methods in a chain.

During the fluent chain setup, the only errors that could occur are illegal argument errors, which are only on malformed input.  Most errors will occur during the heavy-lifting step, the .format() methods.  Because of this, we decided that errors from the fluent chain will be delayed until the .format() methods, and we will expose an API to extract information on the fluent chain errors.

Similar approaches to error codes have been done before in ICU; for example, UBool Edits::copyErrorTo(UErrorCode &errorCode).

Why did you make a new enum UnitWidth instead of re-using FormatWidth?

Because the old FormatWidth uses naming conventions that are inconsistent with currencies.

The existing enum FormatWidth has four entries:

There are five widths we should be concerned with for currency.  Entries 1-4 are from the spec, and entry 5 arises from a number of existing call sites that use workarounds to manually hide the currency symbol from output:

The new UnitWidth combines the two, and most importantly remaps "NARROW" to actually mean "narrow" currency symbols as defined in CLDR, rather than the normal currency symbol, which is now mapped to "SHORT".

How about Parsing?

This API is focused on formatting.  In the future, we plan to propose a "NumberParser" API for parse users.  DecimalFormat attempts to combine formatting and parsing, and this has caused API bloat and confusion, because the two are very different problems.

Why the new package name?

This proposal includes many new top-level names: NumberFormatter, LocalizedNumberFormatter, UnlocalizedNumberFormatter, Notation, Rounder, FractionRounder, CurrencyRounder, et cetera.  There are three ways we could expose these:

  1. Add them all as separate top-level classes to com.ibm.icu.text.
  2. Add them as static members underneath a single top-level class, NumberFormatter, in com.ibm.icu.text.
  3. Make a new package, com.ibm.icu.number.

Options 1 and 2 have downsides: Option 1 would clutter the already giant com.ibm.icu.text package, and Option 2 is more feasible, but it would reduce modularity of the API and of the code by putting everything into one huge java file.  However, when we discussed this in ICU-TC, and no one could name a reason not to add a new top-level package.  This proposal is therefore going with Option 3.

Default Rounding Strategy for Simple Notation

The existing class DecimalFormat has a default of 3 fraction digits, which is not changing.  However, in the new API, we have the freedom to choose the default without "changing" existing behavior.

For something like rounding, I feel that it is generally better for people to be explicit in their chosen strategy.  For example, Kevin (a Googler) pointed out that it is somewhat surprising to new users that an "i18n" library does math by rounding numbers, and I agree with that sentiment.  This gives two general paths forward:

  1. Force users to explicitly specify a rounding strategy, like we do with the locale.
  2. Set a default strategy, but make it something such that if the user doesn't like it, they will notice right away.

A major downside of Path A is that we expect the mainstream use case to be integer numbers, and Path A would force an extra method to be called (as pointed out by Yoshito).  We therefore decided to go with Path B.

We explored several choices for the default:

  1. Use the same default as in DecimalFormat, which is 3 decimal places.
  1. Always show all digits (do not perform rounding).
  1. Round to the nearest integer.
  1. Round to 4 significant digits.
  1. Mark: Round fraction to 4 significant digits (retain all integer digits). eg 123456 → 123456, 123.456 → 123.5, 0.123456 ⇒ 0.1235.  This is similar to the proposed Compact Notation rounding strategy, which rounds to integer but guarantees 2 significant digits.
  1. Andy: Round to 6 decimal places.  This is the default for %f, specified in §7.19.6.1 of ISO/IEC 9899 (the C standard).  %f also has the same 6 decimal places behavior in Java, Python, Perl, and probably other languages with printf.
  1. Yoshito: similar to option 5, but start enforcing 3 significant digits at 3 decimal places.  123456 → 123456, 123.456 → 123.456, 1.23456 → 1.235, 0.0123456 → 0.0123.

Based on the pros and cons, the ICU Technical Committee decided on Option 6.  Our next preference was Option 1, followed by Option 7 and Option 5.  We eliminated Options 2, 3, and 4 early on.

Note: Based on my survey of call sites, there is no single prevailing rounding strategy that users explicitly specify.  User-specified options include integer, unlimited, 2 decimal places, 3 decimal places, and 3 significant digits (mostly used in the context of compact notation).

Default Rounding Strategy for Scientific Notation

For scientific notation rounding, we also decided to go with 6 fraction digits.  This results in 7 significant digits, including the one digit before the decimal mark.  This is also consistent with §7.19.6.1 of ISO/IEC 9899.

Behavior for Doubles in Full-Precision Rounding Strategy

In the case when the user wants us to render a double to full precision, in Java, we currently fall back to Double.toString().  This is a library function that, under the hood, implements a double-to-decimal algorithm called Dragon4.  It renders a string representation of the double to the precision required to differentiate that double from the two adjacent doubles.

In C++, it seems that there is no standard library implementation of Dragon4 or any similar algorithm.  sprintf and friends all require that you specify the precision; there is no way to "automatically" choose the best precision like with Double.toString().

DecimalFormat always renders doubles to 15 significant digits rather than the automatic precision used in Java.

In ICU 60, we will continue using the DecimalFormat behavior, which is different in C++ than it is in Java.  However, we will revisit this issue in ICU 61 and possibly change the behavior to match Java.  To track this issue, see #11318.

Percent/Permille Scaling

Since Percent is now being treated as a Unit, and since it may be added as a proper unit to CLDR soon, we are treating it the same as all other MeasureUnits.  In particular, unlike DecimalFormat, in NumberFormatter, you get "1.23%" when you try to format 1.23 with the percent unit, as opposed to multiplying by 100 and printing "123%".  This section lists 5 reasons why this may be better behavior.

Design Concept:

The old DecimalFormat is modeled based on the idea of spreadsheets.  This is evident based on the pattern syntax, method names, and first-class support for character alignment.  In this context, scaling percents makes sense, because it is most consistent with spreadsheet behavior.

On the other hand, the new NumberFormatter is modeled based on the idea of printf and context-free number formatting.  You give it a number, and it renders the number literally with whatever other attributes you assign.  In printf, you have to multiply by 100 if you want the number displayed times 100.  I argue that the expected behavior for NumberFormatter should be more similar to printf.

In DecimalFormat, you are "formatting a number as a percent".  In NumberFormatter, you are "formatting a number with a percent unit".

Consistency with MeasureUnit:

If you have any of the following lines, no scaling is performed.  If you're formatting a distance, for example, we don't assume that your input is in meters and subsequently multiply by 100 if you ask for centimeters.

NumberFormatter.with().unit(MeasureUnit.METER)...

NumberFormatter.with().unit(MeasureUnit.KILOMETER)...

NumberFormatter.with().unit(MeasureUnit.CENTIMETER)...

NumberFormatter.with().unit(MeasureUnit.PINT)...

NumberFormatter.with().unit(MeasureUnit.QUART)...

In CLDR 33, when Percent is a proper unit, the call site will look the same.

NumberFormatter.with().unit(MeasureUnit.PERCENT)...

Why should "percent" be special-cased?  I think we shouldn't try to be "smart" and special-case the percent unit.

Documentation:

In addition, people will search the API documentation for "percent" to figure out how to format percents.  We can simply state in the documentation that you need a "*100" if that is the behavior you are going after.

Efficiency:

I believe this is more clear for users as well as more efficient.  If scaling happens by default, and you don't like scaling, you have to divide your number by 100.  However, if scaling doesn't happen by default, but you want it to, then you have to multiply your number by 100.  Multiplication is more efficient than division.

Orthogonality:

The concept of a "multiplier" should be thought of like a top-level setting.  (We could very well add "multiplier" as a top-level setting in a future release; it is actually already present under the hood to support compatibility mode.)  If percent units automatically scaled themselves, then they would be implicitly setting the value of the multiplier, leading to non-orthogonality.

Grouping Setters

The setting "minimum grouping digits" is locale-sensitive.  Most locales have 1, but the following locales have values going up to 4:

We should expose grouping in a locale-insensitive way.  Minimum grouping digits should not be exposed directly in the new API, keeping with the idea of "what, not how."  There are three general use cases that I envision:

  1. Suppress all grouping separators: 1000 - 10000 - 100000
  2. Use the locale’s default grouping separators: 1,000 - 10,000 - 100,000
  3. Use the locale’s default grouping separators, but widen the minimum grouping digits: 1000 - 10,000 - 100,000

For instance, this could be in an enum:

enum Grouping { NONE, DEFAULT, WIDE };

CLDR provides only a single coarse "minimum grouping digits" value, so it is unclear exactly how use case 3 would be implemented.

Because of these issues, we introduced the grouping strategy setter as "tech preview" instead of "@draft" to buy us more time.

Error Getters in C++

We (ICU-TC) agreed vaguely that the error code would be set in the .format() method, and that would include errors from all of the fluent settings.  However, it is important that the delayed error codes are descriptive enough to help the user identify the source of the error.

The following error codes are proposed to be added:

In addition to being available in the .format() method, the error code is available in a copyErrorTo() API similar to one introduced in the Edits class in ICU 59.

The concept of returning a descriptive error message string, similar to the custom exception string in Java, was also discussed.  However, we felt that adding new UErrorCode entries was the better path forward.  For example, Steven Loomis said: "Strings are very problematic. Users tend to start doing substring matches against them to handle certain situations."

Open Questions

None right now.