[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #4882(closed defect: fixed)

Opened 3 years ago

Last modified 2 years ago

FractionalUCA: add gap before/after ignorables

Reported by: markus Owned by: markus
Component: uca Data Locale:
Phase: Review: mark
Weeks: 0.2 Data Xpath:
Xref:

Description

FractionalUCA needs additional tailoring gaps for secondary and tertiary weights, otherwise tailorings will yield ill-formed Collation Element Tables. Reason & details see IcuBug:9362.

Secondary weight lead bytes:

  • 00..02 special separators
  • 03..04 gap for tailoring secondary-before a non-ignorable
  • 05..85 "common" weight and its sort key compression range
  • 86..fpi-1 new gap for tailoring secondary-after a non-ignorable (fpi = first primary ignorable, needs to be before the first actually assigned weight)
  • fpi..xx-1 new gap for tailoring secondary-before U+0332 (low line)
  • xx..yy assigned secondary weights for currently U+0332 (low line)..U+1D360 (counting rod one) (yy is the last primary ignorable)
  • yy+1..FF gap for tailoring secondary-after the last primary ignorable

Tertiary weight lead bytes (6 bits 00..3F, without the case bits):

  • 00..02 special separators
  • 03..04 gap for tailoring tertiary-before a non-ignorable or primary ignorable
  • 05 "common" weight (its sort key compression range is created dynamically)
  • 06..fsi-1 new gap for tailoring tertiary-after a non-ignorable or primary ignorable (fsi = first secondary ignorable, needs to be before the first actually assigned weight)
  • fsi..uu-1 new gap for tailoring tertiary-before any assigned tertiary weight
  • uu..vv assigned tertiary weights; pick one byte uu=vv as long as UCA does not have secondary-ignorable characters (vv is the last secondary ignorable)
  • vv+1..3F gap for tailoring tertiary-after the last secondary ignorable

For example:

  • fpi=93
  • xx=97
  • yy=F2 (U+1D360 -> (0, F2 15, 05))
  • fsi=30
  • uu=vv=38

Attachments

Change History

comment:1 Changed 3 years ago by markus

  • Cc markus.icu@… added
  • Weeks set to 0.2

comment:2 Changed 3 years ago by emmons

  • Owner changed from anybody to markus
  • Status changed from new to assigned
  • Milestone changed from UNSCH to 22

comment:3 Changed 3 years ago by mark

  • Owner changed from markus to mark
  • Milestone changed from 22 to soon

comment:4 Changed 3 years ago by pedberg

  • Cc pedberg added

comment:5 Changed 3 years ago by pedberg

  • Milestone changed from soon to 22dres

comment:6 Changed 3 years ago by mark

  • Milestone changed from 22dres to soon

comment:7 Changed 3 years ago by mark

  • Milestone changed from soon to 23

It looks like the code that needs to be changed is the last line in the following, lowering 0x80 and making a named constant.

    static int fixSecondary2(int x, int gap1, int gap2) {
        int top = x;
        int bottom = 0;
        if (top == 0) {
            // ok, zero
        } else if (top == 1) {
            top = FractionalUCA.Variables.COMMON;
        } else {
            top *= 2; // create gap between elements. top is now 4 or more
            top += 0x80 + FractionalUCA.Variables.COMMON - 2; // insert gap to make top at least 87

We also need to make sure that the correct values come out for the following:

[first primary ignorable [, 87, 05]] # U+0332 COMBINING LOW LINE
[last primary ignorable [, E2 15, 05]] # U+1D360 COUNTING ROD UNIT DIGIT ONE

And extensively test the results (that's the reason we aren't doing it now!).

comment:8 Changed 3 years ago by mark

  • Owner changed from mark to markus

comment:9 Changed 3 years ago by markus

  • Milestone changed from 23 to 24

This will go into CLDR 24, with data for UCA 6.3.

comment:10 Changed 2 years ago by markus

  • Cc mark, yoshito, emmons added; markus.icu@… removed
  • Status changed from assigned to accepted
  • Review set to emmons

New data see ticket:5568 r9301, ​http://unicode.org/repos/cldr/trunk/common/uca/FractionalUCA.txt

I reduced the common-secondary range to 05..45 and added

[fixed secondary common byte 05]
[fixed last secondary common byte 45]
[fixed first ignorable secondary byte 80]

[fixed tertiary common byte 05]
[fixed first ignorable tertiary byte 3C]

The FractionalUCA.txt mappings put secondary and tertiary weights within the respective ranges, with some gaps between these boundaries and the first/last actual weights.

More information in the per-constant comments here:
http://unicode.org/repository/unicodetools/trunk/org/unicode/text/UCA/Fractional.java?view=markup

My "collv2" tailoring builder code uses these values for well-formed tailored CEs.

comment:11 Changed 2 years ago by emmons

  • Review changed from emmons to mark

comment:12 Changed 2 years ago by mark

  • Status changed from accepted to closed
  • Resolution set to fixed
View

Add a comment

Modify Ticket

Action
as closed
The ticket will be disowned. The resolution will be deleted. Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.