Accumulated Feedback on PRI #348

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Tue Mar 7 05:59:41 CST 2017
Name: William Overington
Report Type: Public Review Issue
Opt Subject: PRI #348 Length of Tag Sequences

>> Review Note: The UTC is considering limiting the total length of possible valid 
>> tag sequences to some value such as 32. It would appreciate feedback on whether 
>> the length of tag sequences should be limited, and if so, to what length.

In ED-14a there is the following.

>> Though tag_spec includes the values U+E0041 TAG LATIN CAPITAL LETTER A .. U+E005A 
>> TAG LATIN CAPITAL LETTER Z, they are not used currenty and are reserved for future 
>> extensions.

It would be possible to specify that the length of a particular tag sequence
is limited to a particular value such as 32 unless the first character of
tag_spec is a TAG LATIN CAPITAL LETTER that has the effect of increasing the
limit.

That would mean that for many emoji characters there would be a limit, thus
perhaps helping in the implementation of display technology, yet that
restriction would not prevent future expansion of the system so that, for
example, a vector glyph in a platform-independent colour-font-style contour
format could be expressed using tag characters.

By having a method so that longer tag sequences are available it will be
possible in the future, if that is what people choose to do, to include in a
Unicode character stream such things as a three-dimensional model of a
biochemical molecule that an end user may rotate so as to observe the model
from various angles without a general absolute restriction of tag sequence
length stopping such a future development.

I appreciate that such display possibilities are not perhaps within the scope
of Unicode at the present time, yet they and other new ideas might possibly
become part of the everyday use of Unicode in the future and Unicode is being
built for lasting into the future, so I opine that it is important to provide
the infrastructure such that new ideas can flourish.

William Overington

Tuesday 7 March 2017

Date/Time: Fri Mar 3 15:51:34 CST 2017
Name: Anonymous
Report Type: Public Review Issue
Opt Subject: PRI #348: editorial correction


Annex B says:

"While the syntax of an emoji *emoji flag sequence* is defined in *ED-14*, ..."

I suggest removing the first repetition of "emoji" (the one not in bold italics).

Please consider this an anonymous submission, i.e. do not add my name to the Acknowledgments section.



Date/Time: Fri Mar 17 05:22:50 CDT 2017
Name: Christoph Päper
Report Type: Public Review Issue
Opt Subject: PRI#348 Formal Definitions

Some definitions in [section 1.4] (
http://www.unicode.org/reports/tr51/proposed.html#Definitions ) lack a formal
BNF-like specification. It may be helpful to have one for all of them. They
also do not follow the notation given in Appendix A of TUS exactly, e.g.
`\x{ABCDEF}` instead of `\uABCD` and `\U00ABCDEF` (or `U+ABCD` and
`U-00ABCDEF`) for code points.

I also believe that a definition for something like _emoji presentation base_
would be good, so it could be used in ED-8a and 9a instead of the more generic
_emoji character_. It may also be helpful to define terms for the columns in
<http://unicode.org/emoji/charts-beta/text-style.html>.

Here is my attempt at a complete grammar:

;                                        Emoji Characters
emoji_character                      := \p{Emoji}
;                                        Emoji Presentation
default_emoji_presentation_character := \p{Emoji_Presentation}
                                     ;= `+EP`
default_text_presentation_character  := [\p{Emoji} - \p{Emoji_Presentation}]
                                     ;= emoji_character - default_emoji_presentation_character
                                     ;= `-EP`
;                                        Emoji and Text Presentation Sequences
text_presentation_selector           := \uFE0E
text_presentation_sequence           := emoji_presentation_base text_variation_selector
emoji_presentation_selector          := \uFE0F
emoji_presentation_sequence          := emoji_presentation_base emoji_variation_selector
emoji_presentation_base              ;= `+EPSq`
                                     := [ \u0023 \u002A \u0030-\u0039 \u00A9 \u00AE
                                          \u203C \u2049 \u2122 \u2139 \u2194-\u2199 \u21A9-\u21AA
                                          \u231A-\u231B \u2328 \u23CF \u23E9-\u23EA \u23ED-\u23EF \u23F1-\u23F3 \u23F8-\u23F9 \u23FA
                                          \u24C2 \u25AA-\u25AB \u25B6 \u25C0 \u25FB-\u25FE 
                                          \u2600-\u2604 \u260E \u2611 \u2614-\u2615 \u2618 \u261D \u2620 \u2622-\u2623 \u2626 \u262A \u262E-\u262F 
                                          \u2638-\u263A \u2640 \u2642 \u2648-\u2653 \u2660 \u2663 \u2665-\u2666 \u2668 \u267B \u267F 
                                          \u2692-\u2697 \u2699 \u269B-\u269C \u26A0-\u26A1 \u26AA-\u26AB \u26B0-\u26B1 \u26BD-\u26BE 
                                          \u26C4-\u26C5 \u26C8 \u26CF \u26D1 \u26D3-\u26D4 \u26E9-\u26EA \u26F0-\u26F5 \u26F7-\u26F9 \u26FA \u26FD 
                                          \u2702 \u2708-\u2709 \u270C-\u270D \u270F \u2712 \u2714 \u2716 \u271D \u2721 \u2733-\u2734 \u2744 \u2747 
                                          \u2753 \u2757 \u2763-\u2764 \u27A1 \u2934-\u2935 \u2B05-\u2B07 \u2B1B-\u2B1C \u2B50 \u2B55 
                                          \u3030 \u303D \u3297 \u3299 
                                          \U0001F004 \U0001F170 \U0001F171 \U0001F17E-\U0001F17F 
                                          \U0001F202 \U0001F21A \U0001F22F \U0001F237 
                                          \U0001F30D-\U0001F30F \U0001F315 \U0001F31C \U0001F321 \U0001F324-\U0001F32C \U0001F336 \U0001F378 \U0001F37D 
                                          \U0001F393 \U0001F396-\U0001F397 \U0001F399 \U0001F39A-\U0001F39B \U0001F39E-\U0001F39F 
                                          \U0001F3A7 \U0001F3AC-\U0001F3AE \U0001F3C2 \U0001F3C4 \U0001F3C6 \U0001F3CA-\U0001F3CE 
                                          \U0001F3D4-\U0001F3E0 \U0001F3ED \U0001F3F3 \U0001F3F5 \U0001F3F7 
                                          \U0001F408 \U0001F415 \U0001F41F \U0001F426 \U0001F43F \U0001F441 \U0001F442 \U0001F446-\U0001F449 \U0001F44D-\U0001F44E 
                                          \U0001F453 \U0001F46A \U0001F47D \U0001F4A3 \U0001F4B0 \U0001F4B3 \U0001F4BB \U0001F4BF \U0001F4CB \U0001F4DA \U0001F4DF 
                                          \U0001F4E4-\U0001F4E6 \U0001F4EA-\U0001F4ED \U0001F4F7 \U0001F4F9 \U0001F4FA-\U0001F4FB \U0001F4FD 
                                          \U0001F508 \U0001F50D \U0001F512-\U0001F513 \U0001F549 \U0001F54A \U0001F550-\U0001F567 \U0001F56F \U0001F570 \U0001F573-\U0001F579 
                                          \U0001F587 \U0001F58A-\U0001F58D \U0001F590 \U0001F5A5 \U0001F5A8 \U0001F5B1-\U0001F5B2 \U0001F5BC \U0001F5C2-\U0001F5C4 
                                          \U0001F5D1-\U0001F5D3 \U0001F5DC-\U0001F5DE \U0001F5E1 \U0001F5E3 \U0001F5E8 \U0001F5EF \U0001F5F3 \U0001F5FA 
                                          \U0001F610 \U0001F687 \U0001F68D \U0001F691 \U0001F694 \U0001F698 \U0001F6AD \U0001F6B2 \U0001F6B9 \U0001F6BA \U0001F6BC 
                                          \U0001F6CB \U0001F6CD-\U0001F6CF \U0001F6E0-\U0001F6E5 \U0001F6E9 \U0001F6F0 \U0001F6F3
                                      ]
;                                        Text vs. Emoji
emoji_only_character                 := default_emoji_presentation_character - emoji_presentation_base
                                     ;= `+EP -EPSq`
                                     ;~ emoji_character - emoji_presentation_base
emoji_opt-out_character              := emoji_presentation_base - default_text_presentation_character
                                     ;= `+EP +EPSq`
emoji_opt-in_character               := emoji_presentation_base - default_emoji_presentation_character
                                     ;= `-EP +EPSq`
;                                        Emoji Modifiers
emoji_modifier                       := \p{Emoji_Modifier}
emoji_modifier_base                  := [ \u261D  \u26F9  \u270A-\u270D 
                                          \U0001F385 \U0001F3C2-\U0001F3C4 \U0001F3C7 \U0001F3CA-\U0001F3CC
                                          \U0001F442-\U0001F443 \U0001F446-\U0001F450 \U0001F466-\U0001F478 \U0001F47C
                                          \U0001F481-\U0001F483 \U0001F485-\U0001F487 \U0001F4AA
                                          \U0001F574-\U0001F575 \U0001F57A \U0001F590 \U0001F595-\U0001F596
                                          \U0001F645-\U0001F647 \U0001F64B-\U0001F64F
                                          \U0001F6A3 \U0001F6B4-\U0001F6B6 \U0001F6C0 \U0001F6CC 
                                          \U0001F918-\U0001F91E \U0001F926 \U0001F930 \U0001F933-\U0001F939 \U0001F93C-\U0001F93E 
                                      ] 
emoji_modifier_sequence              := emoji_modifier_base emoji_modifier
;                                        Emoji Sequences
emoji_flag_sequence                  := regional_indicator regional_indicator
emoji_tag_sequence                   := tag_base tag_spec tag_term
tag_base                             := emoji_character
                                      | emoji_modifier_sequence
                                      | emoji_presentation_sequence
tag_spec                             := [\U000E0020-\U000E007E]+
tag_term                             := \U000E007F
emoji_combining_sequence             := 
                                      (  emoji_character
                                      |  emoji_presentation_sequence
                                      |  text_presentation_sequence 
                                      ) non_spacing_mark*
emoji_keycap_sequence                := emoji_keycap_base emoji_variation_selector combining_keycap
emoji_keycap_base                    := [0-9#*]
combining_keycap                     := \u20E3
emoji_core_sequence                  := emoji_combining_sequence
                                      | emoji_modifier_sequence
                                      | emoji_flag_sequence
emoji_zwj_element                    := emoji_character
                                      | emoji_presentation_sequence
                                      | emoji_modifier_sequence
fully-qualified_emoji_zwj_element    := default_emoji_presentation_character
                                      | emoji_presentation_sequence
                                      | emoji_modifier_sequence
emoji_zwj_sequence                   := emoji_zwj_element ( ZWJ emoji_zwj_element )+
ZWJ                                  := \u200D
emoji_sequence                       := emoji_core_sequence
                                      | emoji_zwj_sequence
                                      | emoji_tag_sequence
fully-qualified_emoji_zwj_sequence     := ( fully-qualified_emoji_zwj_element ( ZWJ fully-qualified_emoji_zwj_element )+ )
                                        - text_presentation_selector
non-fully-qualified_emoji_zwj_sequence := emoji_zwj_sequence - fully-qualified_emoji_zwj_sequence

Date/Time: Wed Apr 5 10:22:58 CDT 2017
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #348: 3-digit flag emoji tag sequences

According to the rules in section C.1 of the proposed update, a 3-digit 
unicode_region_subtag is only a valid emoji flag tag if its idStatus is 
equal to "regular" or "deprecated". However, all the 3-digit unicode_region_subtags 
have an idStatus of "macroregion". Therefore, there are no valid 3-digit 
unicode_region_subtag flag emoji tag sequences. Defining validity rules 
for something such that nothing is valid might confuse some implementers, 
so I suggest clarifying that this is intentional.

Date/Time: Wed Apr 5 10:57:27 CDT 2017
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #348: 2-letter flag emoji tag sequences

babelstone.co.uk/Fonts/Flags.html says:

>> BabelStone Flags also supports Flag Emoji tag sequences for ISO 3166-1 
>> two-letter country codes (i.e. the US flag can be represented either as 
>> <1F1FA 1F1F8> or as <1F3F4 E0075 E0073 E007F>). I am unclear from 
>> the rather imprecise description of Flag Emoji tag sequences given in Annex C 
>> of Unicode Technical Standard #51 whether this is conformant with the Unicode 
>> Standard or not; but it seems logical to support all geopolitical flags as tag sequences.

Consider adding a note to clarify the validity of such sequences.

Date/Time: Wed Apr 19 20:55:07 CDT 2017
Name: cketti
Report Type: Public Review Issue
Opt Subject: PRI #348: Error in section C.1.3

In the table in section C.1.3 the third content row uses <BLACK FLAG> as 
base character in the 'Sequence' column, but uses "A" in the 'Rec. Images' and 'TU' 
columns. <BLACK FLAG> should be used in those columns.

Furthermore, the sample sequence uses an incorrect subdivision identifier. This weakens 
the idea that one particular condition that makes a tag sequence ill-formed is demonstrated.

Date/Time: Wed Apr 26 16:18:17 CDT 2017
Name: Doug Ewell
Report Type: Public Review Issue
Opt Subject: PRI #348

I oppose the classification in C.1.1 and corresponding data files of three valid emoji 
tag sequences as "standard" and of thousands of other valid sequences, including four 
listed in the table, as "not standard." A potentially useful mechanism is made much less 
useful by this exclusion.

Date/Time: Fri Apr 28 06:46:18 CDT 2017
Name: William Overington
Report Type: Public Review Issue
Opt Subject: PRI #348 Length of Tag Sequences

Yesterday the document was changed.

There is now the following.

>> Review Note: The following constraint is proposed to limit the 
length of tag sequences, to prevent parsers from having to detail with 
unbounded sequences. The UTC would appreciate feedback on this.
>> There is one common constraint on valid emoji tag sequences: 
the tag_spec must not be longer than 32 code points.

I oppose that absolute limit. I opine that introducing such an absolute limit
could stop progress and development. I opine that the Unicode Technical
Committee should encourage new ideas for the future.

In the Review Note the constraint is proposed "to prevent parsers from having
to detail with unbounded sequences."

There is a big difference between a tag sequence that is more than 32 code
points long and a tag sequence of unbounded length.

There could be a rule that there is often a limit of 32 code points and that
any sequence of tag code points that is more than 32 code points in length
starts with a tag character that indicates that there is more than 32 code
points, a U+E007C TAG VERTICAL LINE as the first character of the sequence.
Such a sequence could start by indicating its total length using a U+E007C TAG
VERTICAL LINE character followed by a sequence of tag digit characters
followed by another U_E007C TAG VERTICAL LINE.

By making such a rule now would indicate to implementers the idea to include
in their software a way to check whether a tag sequence starts with a U+E007C
TAG VERTICAL LINE. That would be easy for UTC to do now and would be
straightforward for developers now and would provide an infrastructure for the
future.

For example, in my earlier feedback I mentioned the possibility that a vector
glyph in a platform-independent colour-font-style contour format could be
expressed using tag characters.

I later elaborated on that in two mailing list posts.

http://www.unicode.org/mail-arch/unicode-ml/y2017-m04/0034.html 

http://www.unicode.org/mail-arch/unicode-ml/y2017-m04/0080.html 

Limiting the length of all tag sequences to 32 code points would stop that
being implemented.

For example, in my earlier feedback I mentioned the possibility that in the
future, if that is what people choose to do, of including in a Unicode
character stream such things as a three-dimensional model of a biochemical
molecule that an end user may rotate so as to observe the model from various
angles.

Recently an emoji for a molecule has been proposed. Suppose that that becomes
encoded as an emoji and could be applied as the base character for a tag
sequence defining a particular molecule. If such a sequence starts with a
U+E007C TAG VERTICAL LINE character then some tag digit characters to specify
the total length of the tag sequence then another U+E007C TAG VERTICAL LINE
character then such developments would be possible in a straightforward manner
without disrupting the usual limit of 32 tag characters in a tag sequence.

So, if UTC chooses to often, or even usually, have a 32 code point limit on
the length of a tag sequence, then fine, yet while setting that limit please
specify the U+E007C TAG VERTICAL LINE method suggested above so that the
Unicode Technical Committee encourages progress by implementing an
infrastructure for futuristic developments to be able to take place.

William Overington

Friday 28 April 2017

Feedback above this line was reviewed by the Emoji Subcommittee prior to UTC #151.