L2/13-207

Re:

Which characters should have emoji-style by default?

To:

UTC

From:

Mark Davis

Live:

http://goo.gl/yYvTNW 

        

In unicode.org/Public/UNIDATA/StandardizedVariants.html we have variation sequences for different characters, although not for all of those that could be expected to take an emoji style.

However, what we do not have is some sense of which characters typically have, by default, an emoji representation. We are starting to see many emoji presentations pop up, but without consistency across platforms. That means that a piece of text may show up in a different style than intended. While this is all a perfectly legitimate for Unicode—we don’t guarantee presentation style—it would be useful to have a more shared sense the use of emoji presentation so that there are fewer “jarring” presentations.

That means having a shared sense of which characters normally (where emoji presentation is supported!):

  1. Never would be expected to have an emoji presentation.
  2. Shouldn't have an emoji presentation, but it wouldn't be surprising if they did.
  3. May go either way.
  4. Should have an emoji presentation, but it wouldn't be surprising if they didn't.
  5. Always would be expected to have an emoji presentation.

We do not really need to have variation sequences for #A or #E (always or never), as long as we know which those are. Certainly for #C, and probably also for #B and #D, however, we probably do need variation sequences, which would widen the number we have.

I put together a spreadsheet with information about the current and prospective emoji-style characters (ones that get a colored glyph), at http://goo.gl/cvBEIH. The Sources are:

  1. the three carriers (based on UCD data)
  2. 'sv' for characters that have an 'emoji-style' standardized variant (based on UCD data)
  3. 'apple' for the additional characters (beyond #1 and #2) that according to apps.timwhitlock.info/emoji/tables/unicode have an emoji-style on Apple
  4. 'poss' for characters that are in the same subheading as #2 or #3, and are thus possibilities for getting an emoji-style.
  1. This is just a very rough cut based on shared Unicode subheads. So some should be dropped, and there are probably others that should be added.
  2. For the regional codes I added all those with ISO country codes. We could use the BCP47 codes as as basis, or include the CLDR aliases (which also include some older deprecated codes: the “territory”+“deprecated” 2-letter codes on unicode.org/cldr/charts/latest/supplemental/aliases.html).

I suggest that we put together a group of people to do the following, targeted at U7.0:

  1. Review and propose any additional characters that should have variation sequences for text- vs emoji- style.
  2. Come up with a common list of characters that would be expected to have the emoji-style by default (where emoji presentation is supported).


Some samples of pre-6.0 characters that show up as emoji style by default on a Mac (v10.8.5).

Age=1.1

U+231A ( ⌚ ) WATCH

U+231B ( ⌛ ) HOURGLASS

But NOT for, for example,

U+2600 ( ☀ ) BLACK SUN WITH RAYS

U+2601 ( ☁ ) CLOUD

U+260E ( ☎ ) BLACK TELEPHONE

U+2611 ( ☑ ) BALLOT BOX WITH CHECK

U+2668 ( ♨ ) HOT SPRINGS

U+261D ( ☝ ) WHITE UP POINTING INDEX

U+263A ( ☺ ) WHITE SMILING FACE

That's just U1.1. Other emoji presentation forms pre U6.0 include:

U+25FD ( ◽ ) WHITE MEDIUM SMALL SQUARE

U+25FE ( ◾ ) BLACK MEDIUM SMALL SQUARE

U+2614 ( ☔ ) UMBRELLA WITH RAIN DROPS

U+2615 ( ☕ ) HOT BEVERAGE

U+26A1 ( ⚡ ) HIGH VOLTAGE SIGN

U+267F ( ♿ ) WHEELCHAIR SYMBOL

U+2693 ( ⚓ ) ANCHOR

U+26AA ( ⚪ ) MEDIUM WHITE CIRCLE

U+26AB ( ⚫ ) MEDIUM BLACK CIRCLE

U+2B1B ( ⬛ ) BLACK LARGE SQUARE

U+2B1C ( ⬜ ) WHITE LARGE SQUARE

U+1F004 ( 🀄 ) MAHJONG TILE RED DRAGON

U+26BD ( ⚽ ) SOCCER BALL

U+26C4 ( ⛄ ) SNOWMAN WITHOUT SNOW

U+26C5 ( ⛅ ) SUN BEHIND CLOUD

U+26D4 ( ⛔ ) NO ENTRY

U+26EA ( ⛪ ) CHURCH

U+26F2 ( ⛲ ) FOUNTAIN

U+26F3 ( ⛳ ) FLAG IN HOLE

U+26F5 ( ⛵ ) SAILBOAT

U+26FA ( ⛺ ) TENT

U+26FD ( ⛽ ) FUEL PUMP

U+2757 ( ❗ ) HEAVY EXCLAMATION MARK SYMBOL

U+2B55 ( ⭕ ) HEAVY LARGE CIRCLE

U+1F21A ( 🈚 ) SQUARED CJK UNIFIED IDEOGRAPH-7121

U+1F22F ( 🈯 ) SQUARED CJK UNIFIED IDEOGRAPH-6307

but *NOT*

U+26BE ( ⚾ ) BASEBALL

and others