CLDR Ticket #10076(accepted data)
|Reported by:||mark||Owned by:||mark|
During the CLDR development, the question came up about ExtendedPictographic. We originally formulated that to get around a significant problem in segmentation (character/word/linebreak), and put it into CLDR as a vehicle. It is too late to make any changes right now, but I don't think we want to have the situation remain as it is.
I think the right approach at this point would be to propose something like the following to the UTC in May:
- Move Extended_Pictographic into the emoji data files, for the next version after Emoji 5.0 (Emoji 6.0 or perhaps a sooner small update Emoji 5.1, whatever timing is needed). The contents should be the current Extended_Pictographic + Emoji X - Emoji_Component + MALE SIGN + FEMALE SIGN.
- After Unicode 10.0, propose modifying the segmentation rules in UAX#14 and UAX#29 based on LDML (updated somewhat):
- GB11′ [:Extended_Pictographic:] ZWJ × [:Extended_Pictographic:]
- WB3c′ ZWJ × [:Extended_Pictographic:]
- LB8a′ ZWJ × (ID | [:Extended_Pictographic:])
- Along with #2, add text to both UAX#14 and UAX#29 that
- The rules for segmentation may use properties outside of the main property associated with the algorithm. In such a case, such properties are indicated with the UnicodeSet notation, such as [:General_Category=Letter:].