[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #5081(closed enhancement: fixed)

Opened 3 years ago

Last modified 22 months ago

change FractionalUCA to use new syntax to refer to weights

Reported by: markus Owned by: markus
Component: uca Data Locale:
Phase: Review: mark
Weeks: 0.2 Data Xpath:
Xref:

ticket:5142

Description

For UCA 6.2.1 probably --

UCA 6.2 CollationAuxiliary.html adds a section detailing the "Implicit Fractional Weight Generation"

As mentioned in the new section: “It is important to match the exact algorithm so that the weights for compatibility decomposable characters match.”

This is problematic. The current implicit-weight algorithm requires changes every time a CJK ideograph is added to Unicode, which requires that the code is precisely synchronized with the FractionalUCA version, and hardcoding the results of the algorithm in the FractionalUCA file makes it hard to use a different implicit-weight algorithm.

It would be much better to use simple syntax to refer to weights or CEs of characters.

  1. Change FractionalUCA to use new syntax to refer to weights. In particular, change the FractionalUCA data to use this new syntax for all weights and/or CEs for characters whose decompositions contain Unified Ideographs, so that the implicit weights do not appear in the file.
  1. Remove the Implicit Fractional Weight Generation text from CollationAuxiliary.html.
  1. In CollationAuxiliary.html, describe the new syntax instead.

For example:

  • [U+4E00] copies the entire CE of U+4E00.
  • [U+4E00, 09] copies the primary & secondary weights and sets the tertiary weight to 09.
  • [U+4E00, 05, 09] copies the primary weight and sets the secondary & tertiary weights to 05/09.

Only a single code point must be referred to, and it must have a single CE.

This would change CJK COMPATIBILITY IDEOGRAPH-F967
from
F967; [E0 04 20, 05, 05]
to
F967; [U+4E0D]

and PARENTHESIZED IDEOGRAPH ONE
from
3220; [0A B5, 05, 09][E0 04 06, 05, 09][0A B7, 05, 3D]
to
3220; [0A B5, 05, 09][U+4E00, 09][0A B7, 05, 3D]

It would also allow making other entries more readable, for example changing
LATIN SMALL LETTER A WITH DIAERESIS
from
00E4; [27, 05, 05][, 9D, 05]
to
00E4; [U+0061][U+0308]
etc.
but this has low priority.

Attachments

Change History

comment:1 Changed 3 years ago by markus

  • Type changed from unknown to enhancement

comment:2 Changed 3 years ago by mark

  • Milestone changed from UNSCH to soon

comment:3 Changed 3 years ago by markus

  • Xref set to 5142

comment:4 Changed 3 years ago by mark

  • Milestone changed from soon to 23

comment:5 Changed 3 years ago by emmons

  • Owner changed from anybody to markus
  • Priority changed from assess to medium
  • Status changed from new to assigned

comment:6 Changed 2 years ago by markus

  • Milestone changed from 23 to 24

I am doing this for UCA 6.3/CLDR 24.

comment:7 Changed 22 months ago by markus

  • Cc mark, yoshito, pedberg, emmons added
  • Status changed from assigned to accepted
  • Review set to emmons

comment:8 Changed 22 months ago by emmons

  • Review changed from emmons to mark

comment:9 Changed 22 months ago by mark

  • Status changed from accepted to closed
  • Resolution set to fixed
View

Add a comment

Modify Ticket

Action
as closed
The ticket will be disowned. The resolution will be deleted. Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.