[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #10226(reviewing data)

Opened 5 months ago

Last modified 3 weeks ago

Fix segmentation rules

Reported by: mark Owned by: mark
Component: unknown Data Locale:
Phase: dvet Review: markus
Weeks: Data Xpath:
Xref:

Description

See email chain on cldr-users, titled "Word break question". (Richard, Cameron, Philippe, thanks for tracking this down...)

The [...] syntax must only be used for valid UnicodeSets, so the last two lines below are invalid.

<variable id="$Hebrew_Letter">($Hebrew_Letter $FEZ*)</variable>
<variable id="$ALetter">($ALetter $FEZ*)</variable>
<variable id="$MidNumLet">($MidNumLet $FEZ*)</variable>
<variable id="$Single_Quote">($Single_Quote $FEZ*)</variable>
...
<variable id="$AHLetter">[$ALetter $Hebrew_Letter]</variable>
<variable id="$MidNumLetQ">[$MidNumLet $Single_Quote]</variable>

As for the fix. I think the cleanest approach is to move the two lines above the definition of "$FEZ", then add the following rewrite lines at the end of the <variables> section. That way everything is parallel:

<variable id="$AHLetter">($AHLetter $FEZ*)</variable>
<variable id="$MidNumLetQ">($MidNumLetQ $FEZ*)</variable>

This should be posted as a known issue, and fixed soon, since it may have an impact on some of the Unicode 10.0 work.

The CLDR test code is too lenient about this syntax, and also needs to be fixed to check for it.

Attachments

Change History

comment:1 Changed 5 months ago by mark

  • Priority changed from assess to critical
  • Type changed from unknown to data
  • Milestone changed from UNSCH to 32

comment:2 Changed 4 months ago by mark

  • Owner changed from anybody to mark
  • Phase changed from dsub to dvet
  • Status changed from new to accepted

comment:3 Changed 3 weeks ago by mark

  • Status changed from accepted to reviewing
  • Review set to markus

fixed in Unicode tools since Segmenter.java has moved there, under this ticket.

View

Add a comment

Modify Ticket

Action
as reviewing
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.