[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #10226(new data)

Opened 4 weeks ago

Last modified 3 weeks ago

Fix segmentation rules

Reported by: mark Owned by: anybody
Component: unknown Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:
Xref:

Description

See email chain on cldr-users, titled "Word break question". (Richard, Cameron, Philippe, thanks for tracking this down...)

The [...] syntax must only be used for valid UnicodeSets, so the last two lines below are invalid.

<variable id="$Hebrew_Letter">($Hebrew_Letter $FEZ*)</variable>
<variable id="$ALetter">($ALetter $FEZ*)</variable>
<variable id="$MidNumLet">($MidNumLet $FEZ*)</variable>
<variable id="$Single_Quote">($Single_Quote $FEZ*)</variable>
...
<variable id="$AHLetter">[$ALetter $Hebrew_Letter]</variable>
<variable id="$MidNumLetQ">[$MidNumLet $Single_Quote]</variable>

As for the fix. I think the cleanest approach is to move the two lines above the definition of "$FEZ", then add the following rewrite lines at the end of the <variables> section. That way everything is parallel:

<variable id="$AHLetter">($AHLetter $FEZ*)</variable>
<variable id="$MidNumLetQ">($MidNumLetQ $FEZ*)</variable>

This should be posted as a known issue, and fixed soon, since it may have an impact on some of the Unicode 10.0 work.

The CLDR test code is too lenient about this syntax, and also needs to be fixed to check for it.

Attachments

Change History

comment:1 Changed 3 weeks ago by mark

  • Priority changed from assess to critical
  • Type changed from unknown to data
  • Milestone changed from UNSCH to 32
View

Add a comment

Modify Ticket

Action
as new
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.