[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #9932(design unknown)

Opened 19 months ago

Last modified 6 weeks ago

Restructure GenerateUnihanCollators.java

Reported by: mark Owned by: pedberg
Component: collation Data Locale:
Phase: rc Review:
Weeks: Data Xpath:


GenerateUnihanCollators.java has a lot of old, unnecessary code that was used to "fill in" values for kMandarin and kTotalStrokes.

We can now dispense with that, and use kMandarin and kTotalStrokes directly.

The code should:

  1. read those values
  2. add values for non-Unified-Ideographs where missing
    1. For radicals, strokes, 〇 and other non ideographs, see http://www.unicode.org/L2/L2016/16223r-augmenting-cjk-strokes.pdf) based on either stroke count, or for pinyin their mappings to Unified Ideographs.
    2. For compatibility characters, use the mapping to regular ones for their pinyin/stroke values
  3. generate drop-in files for Han-Latin.txt and collation/zh.xml
    1. (right now, we have to cut and paste).

In addition, the unicode tools should ensure that

  1. every Unified Ideograph has kTotalStrokes
  2. every character with a (kHanyuPinlu value, kXHC1983 value, or kHanyuPinyin value) also has a kMandarin value.


Change History

comment:1 Changed 3 months ago by pedberg

  • Cc pedberg added

comment:2 Changed 2 months ago by kristi

  • Owner changed from anybody to mark
  • Status changed from new to design
  • Component changed from unknown to collation
  • Milestone changed from UNSCH to 34

comment:3 Changed 6 weeks ago by mark

  • Owner changed from mark to pedberg
  • Phase changed from dsub to rc

Add a comment

Modify Ticket

as design

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.