Public Review Issues

Accumulated Feedback on PRI #343

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Sat Dec 10 21:31:03 CST 2016
Name: Doug Ewell
Report Type: Public Review Issue
Opt Subject: Feedback on PRI #343

I agree with the proposed emoji-tag mechanism to represent flags of subdivisions 
and selected macroregions, without reservation.

The following passage seems incompletely worded:

"The tag_spec consists of all characters from U+E0020 TAG SPACE, minus U+E007F CANCEL 
TAG and minus..."

If taken literally, this passage would mean that the range of characters is open-ended 
and extends to the variation selectors beginning at U+E0100 and the private-use blocks. 
The "from" in this sentence should have a matching "to" or should otherwise indicate 
that the set of characters is constrained to the tag block.

Date/Time: Wed Jan 4 09:29:04 CST 2017
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #343

In section 7, “black right-pointing double triangle with vertical bar” is not
the name of U+23EF.

In section C.1, the example image for the invalid tag sequence 👩usca✦ should
include a woman, not a flag.

The example images for the invalid tag sequences Ausca✦ (with a non-emoji
base) and us✦a (with no base) include question marks in diamonds. Since tags
have Default_Ignorable_Code_Point=Yes, such tag sequences should be ignored
and not displayed.

emoji-variation-sequences.txt claims to have a total of 351 sequences. In
fact, there are 702.

Date/Time: Wed Jan 4 21:20:45 CST 2017
Name: Christoph Päper
Report Type: Public Review Issue
Opt Subject: PRI#343: ISO 3166-2 in UTR#51 / UTS#52

Unicode PRI #343: Proposed Update UTR #51, Unicode Emoji (Version 5.0)
======================================================

Background, history
--------------------

The original Japanese emoji set contained 10 flag symbols, representing the
US, the UK, Germany, Spain, France, Italy, Russia, Japan, South Korea and
China. [L2/09-027](http://www.unicode.org/L2/L2009/09027r2-emoji-backgrnd.pdf)
It was ambiguous whether the appropriate encoding would be for countries (or
some other political or geographic entity) or for flags (vexillological).

The Unicode Consortium eventually chose ISO 3166-1 alpha-2 codes as its base,
perhaps because CLDR was already using that standard.  Despite its title,
“Codes for the representation of names of countries and their subdivisions –
Part 1: Country codes”, ISO 3166-1 encodes, roughly speaking, geographically
separate political entities. This includes most members of the United Nations
and some special administrative areas, which are either part of an encoded
“country” for, at least, some matters or are “unions” of “countries” (EU, UN).
Since the Unicode Standard is co-published as ISO/IEC 10646, it probably made
sense to choose the most appropriate ISO standard.

Alpha-2 uses letter pairs of the basic roman alphabet (A–Z) without case
distinction. ISO 3166-1 also provides alternative 3-letter and 3-digit codes,
which TUS does not support, although CLDR does use some of them.  The selected
codes thus would have required 26 (universal), 52 (position-specific) or 676
(precomposed) code points to cover all possible combinations, cf. [L2/09-153 /
N3636](http://www.unicode.org/L2/L2009/09153-n3636-emoji-adhoc.pdf) #17. Over
a third of these already are or had been assigned, and some are permanently
reserved for private use – which the CLDR adaptation makes use of in a handful
of cases (e.g. XK).

- The other parts of ISO 3166 provide standard codes for subregions (-2), … 
(-3) or previously assigned codes (-4, Alpha-4, since ca. 1970). The length 
of codes (1–8) for subregions and the symbols used (i.e. digits, letters, mixed, 
both) differ a lot by “country”.

Problems
---------

The 26 Regional Indicator Symbol Letters U+1F1E6–FF (RISL) are currently
almost exclusively used for the interchange of emoji flags. Sometimes, though,
users try to use them as letters which have emoji display (and are then
surprised to find some tuples changed to flags). They cannot be reused for
3-letter or 4-letter codes, because most existing implementations display a
flag whenever 1 or more valid 2-letter codes are found within a string of
RISLs. For the same reason, they cannot be used for most part-2 subregions,
even if there was Regional Indicator Symbol Separator/Hyphen.  RISLs are case-
less, so adding a second set of 26 symbols would seem strange and digits are
not used in all national variants.

There are [thousands of existing subregion
codes](http://pastebin.com/raw/PKvWPeCj) and many more possible. Therefore,
the proposed solution leaves it up to implementors / vendors which codes are
actually supported as emoji flags. This leads to UTC/ESC having to document
popular and interoperable codes.

The fallback in implementations of Emoji 4.0, i.e. a generic white flag, is
really bad. The proposed fallback in implementations of Emoji 5.0, i.e. a
generic flag with problem indicator, would still be bad.

User demand
--------------

Actual users mostly wish for emoji flags that represent either of …

    1. *encoded subregions* with strong national identity, possibly striving 
	for (regaining) independence or already partaking individually in some 
	international venues, especially sports (e.g. Scotland, England, Wales, 
	Northern Ireland; Tibet; Catalonia)
    2. *unencoded subregions* with strong national or local identity, possibly 
	striving for (regaining) independence (e.g. South Vietnam)
    3. (unencoded) *organizations* including multiple countries, possibly acting 
	like a country in some bilateral treaties (e.g. NATO, ASEAN, Mercosur)
    4. (unencoded) regions spanning *multiple countries*, possibly striving for 
	(regained) unification (e.g. Benelux, Nordic Countries / Scandinavia, Yugoslavia)
    5. (unencoded) regions spanning *parts of multiple countries*, possibly 
	striving for (regained) unification (e.g. Basque country, Kurdistan, Assyria)
    6. previous *variants* of encoded nations, possibly representing a different 
	form of government (e.g. )
    7. symbolic flags used in *sports* (e.g. targets ⛳️🚩🏁, penalties, signals)
    8. other forms of *visual communication codes* (e.g. naval signal flags)
    9. transnational political, religious or ethnic *movements* (e.g. Pan-African Flag, Communism)
    10. a-national *identity groups*, including sexuality and fandom (e.g. bisexual)

Only the first of these cases is covered by the currently proposed update to UTR# 51 – 
and indeed any previously discussed variant based upon ISO 3166-2 (e.g. TERIS).

For many of the covered codes, there is no real demand measurable. Even with the 
roughest estimations possible (e.g. Twitter search), there are many non-subregion 
flags ranking higher in user demand than many of the newly proposed subregion flags.

The last case (#10) already has at least one precedent with the Rainbow Flag emoji 
sequence (and Twemoji’s Jolly Rogers).

Solution
--------

The problem to be solved needs to be stated explicitly. The only actual
proposal in the L2 registry by the end of 2016 seems to be
[L2/16-180]( http://L2/L2016/16180-eng-scot-wales-flags.pdf ) for the flags of
the “home nations” of the United Kingdom (GB), called “countries” in ISO
3166-2:GB. They are the only cases to have FIFA 3-letter codes but no top-
level ISO code, cf. [Wikipedia comparison]
( http://en.wikipedia.org/wiki/Comparison_of_IOC,_FIFA,_and_ISO_3166_country_codes  ).

Disclaimer: I have asked the ISO 3166 Maintenance Agency to add codes for
these 3 or 4 exceptional regions or, alternatively, what it would formally
take to get them encoded, but have not received any reply yet.

Since UTC needs to prepare for updates to CLDR / ISO 3166 and real-world
implementations of the proposed encoding, i.e. it needs to serve as a registry
of supported codes out of thousands defined out of millions possible, this
would not be much better than becoming a registry for arbitrary (ZWJ)
sequences like, to name just two obvious possibilities,

 - alpha-2 codes and a single (externally standardized) emoji character to 
	represent subregions,
 - existing emoji flags (🏳, 🏴) and 1 or 2 other emojis to represent different 
	entities (like the Rainbow Flag).

The process (as for any new ZWJ sequence or ISO X-code) should be:

    1. Use in the wild (e.g. 🇬🇧🐲 for Wales or 🏳🦄 for Scotland).
    2. First implementation converts character sequence to an inline 
	flag glyph (e.g. XE for England in Whatsapp).
    3. Official proposal to UTC which needs to proof existing use as 
	well as documentation of alternatives and compatibility (e.g. UN).
    4. More font support indicated (intent to implement) or observed.
    5. Registration (e.g. XK for Kosovo).
    6. Wide font support.

In conclusion, I urge the UTC not to adopt *any* flag encoding scheme based
upon ISO 3166-2! It’s inappropriate for the task and not easier to manage than
a formal registry of arbitrary sequences.

PS: [On Twitter]( https://twitter.com/FakeUnicode/status/807739878406385664 ),
@FakeUnicode summarized most of this better than I could.

Date/Time: Thu Jan 26 02:59:30 CST 2017
Name: Christoph Päper
Report Type: Public Review Issue
Opt Subject: PRI#343: Emoji Ligatures and Alternates in UTR#51

(I know that PRI#343 is already closed, but this comment actually did apply to
previous versions as well and probably will apply to the next draft, so
please relabel as appropriate.)

http://www.unicode.org/reports/tr51/proposed.html#Emoji_ZWJ_Sequences

This kind of sequence should be called “Emoji Ligature Sequences” or “Joinable
Emoji Sequences” and the ZWJ character (as well as any Variation Selector 16)
should be optional if higher-level protocols can express the same semantics at
the discretion of the author.

In an Opentype font ressource, for instance, valid (registered/documented)
sequences that include a ZWJ (e.g. genders) would be implemented with the
standard ligature `liga` or the required ligature `rlig` feature, which are
usually enabled by default, but more could be supported with the opt-in
discretionary `dlig` or contextual `clig` ligatures feature. “Simple”
compositions of multiple emojis in the space of a single one (e.g. families
and kissing couples) should perhaps use `ccmp` instead.

The declared language of the surrounding text or the context (i.e. neighboring
characters) could also trigger alternate glyphs with the `locl` (e.g. text on
🛑) and `calt` features (e.g. train car 🚃 fitting with preceding locomotive
🚟🚋🚝🚄🚅🚈🚞🚂 or common water base line in 🚣🏊🏄⛵️🛥🚤⛴🛳). Since emoji
design is evolving and converging among vendors, authors would sometimes
prefer to access an earlier glyph of a character with `hist` (e.g. Apple’s
infamous 😁) or an alternative design with `salt` (e.g. head portrait vs. full
body graphic for some animals) and either glyph with `aalt`.

https://en.wikipedia.org/wiki/List_of_typographic_features#Ligation_and_alternate_forms_features_intended_for_all_scripts