[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #10088(closed data: fixed)

Opened 8 months ago

Last modified 8 months ago

Problem with CLDR spec GB11′ rule

Reported by: pedberg Owned by: pedberg
Component: segmentation Data Locale:
Phase: rc Review: mark
Weeks: Data Xpath:
Xref:

Description (last modified by pedberg) (diff)

There is a problem with the modified grapheme break rule GB11′  as described in the current draft UTS 35 http://unicode.org/repos/cldr/trunk/specs/ldml/tr35.html#Extended_Pictographic and as implemented in the current trunk ICU.

Per the latest draft UTS 51 http://www.unicode.org/draft/reports/tr51/tr51.html we have:

  • ED15a emoji_zwj_element := emoji_character | emoji_presentation_sequence | emoji_modifier_sequence
  • ED16 emoji_zwj_sequence := emoji_zwj_element ( ZWJ emoji_zwj_element )+

so an emoji_zwj_element can be a presentation sequence (or a modifier sequence).

This works fine with the standard GB11 in current draft UAX 29 http://www.unicode.org/repos/draft/trunk/reports/tr29/tr29.html#Grapheme_Cluster_Boundary_Rules which has

  • GB11: ZWJ × (Glue_After_Zwj | EBG)

However, the modified version in CLDR excludes emoji_presentation_sequences, at least as the initial element:

  • GB11′: (Extended_Pictographic | EmojiNRK) ZWJ × (Extended_Pictographic |EmojiNRK)

It does not exclude emoji_modifier_sequences because of the earlier standard GB10 rule

  • GB10: (E_Base | EBG) Extend* × E_Modifier

To address this, GB11′ needs to be modified as follows:

  • GB11′: (Extended_Pictographic | EmojiNRK) Extend* ZWJ × (Extended_Pictographic |EmojiNRK)

I have demonstrated both the problem and fix in Apple's version of ICU with extended rbbitst.txt entries after rolling in open-soiuce ICU 59m1.

Attachments

Change History

comment:1 Changed 8 months ago by mark

Even better would be just \uFE0F?

comment:2 Changed 8 months ago by pedberg

  • Owner changed from anybody to pedberg
  • Status changed from new to accepted
  • Milestone changed from UNSCH to 31

Agreed in CLDR this is the right solution; affects spec and CLDR version of segmentation rules. Will file an ICU ticket and discuss there, and also send e-mail to emoji subcommittee

comment:3 Changed 8 months ago by pedberg

  • Phase changed from spec-beta to rc
  • Description modified (diff)

comment:4 Changed 8 months ago by pedberg

  • Description modified (diff)

comment:5 Changed 8 months ago by pedberg

  • Status changed from accepted to reviewing
  • Review set to mark

comment:6 Changed 8 months ago by mark

  • Status changed from reviewing to closed
  • Resolution set to fixed
View

Add a comment

Modify Ticket

Action
as closed
Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.