[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #11086(new unknown)

Opened 3 months ago

Last modified 3 months ago

reconcile exemplar set with kIICore

Reported by: srl Owned by: anybody
Component: unknown Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:
Xref:

Description (last modified by srl) (diff)

  • According to oral tradition, CLDR's exemplar set for Chinese (zh) may have started out using code page repertoire and then had additions due to missing characters needed for CLDR. (Note that CLDR contains lists of languages, territories, technical terms in units, so CLDR's own content may not be biased towards "normal" text content)
  • Unihan has a kIICore property (see UnicodeReports:tr38#kIICore ) which is:

    the IRG-produced minimal set of required ideographs for East Asian use

  • Also see recent docs UTC:L2/18-066 and UTC:L2/18-071

Therefore:

  • Consider harmonizing exemplarSet with kIICore, and also the IDNA set.

Attachments

Change History

comment:1 Changed 3 months ago by srl

  • Description modified (diff)

comment:2 Changed 3 months ago by srl

  • Description modified (diff)

comment:3 Changed 3 months ago by mark

Adding supplied info:

The current ICANN Maximum Starting Repertoire (version 3) is documented here:

https://www.icann.org/resources/pages/msr-2015-06-21-en


The LGR xml file is here:

https://www.icann.org/sites/default/files/packages/lgr/msr/msr-3-wle-rules-28mar18-en.xml

The CJK repertoire contains 19855 ‘Hani’ characters, a couple are in the 30xx area.

However, CLDR shouldn't use MSR-3 for other ‘essential’ definitions, because it is intended for root domain repertoire which has a restriction on digits and symbol look-alike letters which don’t apply on other contexts.

View

Add a comment

Modify Ticket

Action
as new
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.