[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #9932(new unknown)

Opened 15 months ago

Restructure GenerateUnihanCollators.java

Reported by: mark Owned by: anybody
Component: unknown Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:


GenerateUnihanCollators.java has a lot of old, unnecessary code that was used to "fill in" values for kMandarin and kTotalStrokes.

We can now dispense with that, and use kMandarin and kTotalStrokes directly.

The code should:

  1. read those values
  2. add values for non-Unified-Ideographs where missing
    1. For radicals, strokes, 〇 and other non ideographs, see http://www.unicode.org/L2/L2016/16223r-augmenting-cjk-strokes.pdf) based on either stroke count, or for pinyin their mappings to Unified Ideographs.
    2. For compatibility characters, use the mapping to regular ones for their pinyin/stroke values
  3. generate drop-in files for Han-Latin.txt and collation/zh.xml
    1. (right now, we have to cut and paste).

In addition, the unicode tools should ensure that

  1. every Unified Ideograph has kTotalStrokes
  2. every character with a (kHanyuPinlu value, kXHC1983 value, or kHanyuPinyin value) also has a kMandarin value.



Add a comment

Modify Ticket

as new

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.