L2/16-042 Clarifications Requested for "Full Emoji Data" and Emoji Flags Agustin Fonts and Roozbeh Pournader (Google) January 26, 2015 Given the increasingly public exposure of emojis and the role that Unicode plays in defining which emojis should be adopted and how the some of the complex emojis should be handled (joiner sequences, varations sequences, etc), it is important to have clarity on what is considered part of the standard and what isn't. A good example of such a confusing web page is the "Full Emoji Data" page. This list is often used by the press and developers to see how different platforms render emojis. Some elements of this list, like the list of possible flag emojis and their glyphs are not actually part of the Unicode standard but rather a suggestion. While there is a disclaimer linked in the page, even after reading the disclaimer it is not immediately obvious to a user not intimately acquainted with Unicode that they may not be looking at the Unicode standard. To avoid such confusions between suggestions, technical reports, and standards, the content should be more clearly organized and labeled so that consumers of the information are not misled. Proposals ========= Here are our specific proposals: 1. At the top of the full emoji list, clarify that this page is not in any way part of the Unicode standard, is not normative, and is just an informative page created to document some of the known implementations. Also refer to the section at the top of UTR #51 that clarifies "A Unicode Technical Report (UTR) contains informative material. Conformance to the Unicode Standard does not imply conformance to any UTR. Other specifications, however, are free to make normative references to a UTR." 2. In the full emoji list, remove the color glyphs for flags from the chart column. They have not been reviewed by the Unicode Consortium in any way, but create the assumption that the have been. Replace them with what is actually in the Unicode charts, which is a sequence of two regional indicator letters in a box. 3. Separate the list of flags into a separate subpage (similar to the joiner sequences page), and add clear disclaimers at the top of the page about the list of potential flags not being a part of the Unicode standard, and that Unicode has no position on suitability of the flags implemented by vendors for any region, or that such regions even have an appropriate flag to represent them. 4. In UTR #51, refer to the section of the Core Specification about Regional Indicator Symbols regarding what's normative about them. 5. Replace the language in UTR #51 referring to "country flags" with "region flags" (LDML and BCP 47 terminology) or "territory flags" (ISO 3166 terminology). 6. In Annex B of UTR #51, clarify that this representation of flags as text is not necessarily interoperable, and vendors may chose different flags to represent the same region or change the flag over time. Also mention that there may be no appropriate flag for a region, or a flag that may have once been appropriate for a region may suddenly no longer be appropriate in a very short period of time. 7. In UTR #51, warn of potential security risk of emoji flags, especially regarding different region codes potentially resulting in exactly the same flag. 8. For flags of regions that are subdivisions of other regions, for example UM (United States Minor Outlying Islands) and US (United States), create confusable pairs in UTS #39 and such other standards. Alternative approach to flags ============================= It should be noted that it would be a great value to the text interchange community if the Unicode Consortium actually standardizes specific flags, rather than general region indicators with all of the above problems. (The specific flags do not necessarily need to be encoded as characters, but could be done as character sequences.) We understand that the Consortium and its committees may not be interested in this, but we would like to mention that as the main organization for standardizing, the community looks up to the Unicode Consortium as a central authority for defining behavior of text.