[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #10497(accepted data)

Opened 4 months ago

Last modified 2 months ago

Serious CLDR 30/31 regression in zh stroke collation

Reported by: pedberg Owned by: pedberg
Component: collation Data Locale: zh
Phase: dvet Review:
Weeks: Data Xpath:
Xref:

ticket:9414

ticket:9765

ticket:10055

ticket:10642

Description

In CLDR 30/31, there was a serious regression in the zh stroke collation. For example, the following common characters/radicals were missing in the stroke collation:

乛   \u4E5B
冂   \u5182
卜   \u535C
又   \u53C8
小   \u5C0F
日   \u65E5
月   \u6708
牛   \u725B

This is related to changes for the following tickets:

There is likely some problem in the tooling or the way it was run.

I am checking to see whether the problem is still present in CLDR 32 data, updated per cldrbug 10055: Unicode 10

Attachments

Change History

comment:1 Changed 4 months ago by pedberg

  • Keywords Apple-31575085,33066940 added

comment:2 Changed 4 months ago by pedberg

  • Data Locale set to zh
  • Owner changed from anybody to pedberg
  • Status changed from new to accepted
  • Phase changed from dsub to dvet
  • Milestone changed from UNSCH to 32

Peter to check whether the problem still exists in current CLDR data, then reassign as necessary

comment:3 Changed 3 months ago by pedberg

This is still true in current CLDR trunk. Here for example is the data for <collation type='stroke' alt='short'> for index 2 in versions since CLDR 29:

CLDR 29, index 2 has 冂 \u5182 卜 \u535C 又 \u53C8

  <collation type='stroke' alt='short'>
    <cr><![CDATA[
       ...
       <'\uFDD0\u2802' # INDEX 2
       <*丁丂七丄丅丆丩丷乂乃乄𠂆𠂇𠂊乜九了𠄎二亠人亻儿入八⺆冂冖冫⺇几凵⺈刀刁刂力勹匕匚匸十⺊卜卩厂厶⺀又巜讠⻏⻖𨸏〢〤〦 # 2

CLDR 30 through current trunk, they are missing:

  <collation type='stroke' alt='short'>
    <cr><![CDATA[
      ...
      <'\uFDD0\u2802' # INDEX 2
      <*丁丂七丄丅丆丩丷乂乃乄𠂆𠂇𠂊乜九了𠄎二亠人亻儿入八⺆冖冫几⺇凵刀刁⺈力勹匕匚匸十⺊卩厂厶⺀巜讠⻏𨸏〢〤〦 # 2
                                                  ^ 冂 used to be here

The odd thing is that we have later rules still referring to the position of e.g. 冂 \u5182 which no longer has a position:

    &冂<<<⼌
    i.e. &\u5182<<<\u2F0C

comment:4 Changed 2 months ago by pedberg

  • Cc dongyuan added; dongyuan_liu@… removed
  • Xref changed from 9414, 9765, 10055 to 9414, 9765, 10055, 10642
  • Milestone changed from 32 to 33

This fix is for tooling, and is too much for CLDR 32 at this point. In CLDR 32 we will have revert to CLDR 29 stroke collation to address the problem, per cldrbug 10642:. The real fix in tooling is per this bug, which is moving to CLDR 33.

comment:5 Changed 2 months ago by pedberg

The problem was introduced in r12662

View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.