[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #9944(accepted)

Opened 2 years ago

Last modified 6 months ago

Put Hani after Latin in sorting for Simplified Chinese

Reported by: xefbfbd@… Owned by: markus
Component: collation Data Locale:
Phase: rc Review:
Weeks: 0.1 Data Xpath:


According to GB/T 13418-92 文字条目通用排序规则 (Character and entry filing principles)

5.3 汉字字符与非汉字字符混合出现时排序

为了便于信息处理,依据国家标准 GB 2312 中字符的排列顺序,确定以下排序的前后次序:

There is no official English version, following translated text is mainly made by Google Translate, with a few tweaks by me

5.3 Sort Order When Chinese Characters and Non-Chinese Characters Appear Together

For the purpose of facilitating information processing, the following sort order is determined according to the characters order of national standard GB2312:
Space - Ordinal Number - Arabic Numerals - Latin (uppercase, lowercase) - Kana (Hiragana, Katakana) - Greek - Russian - Chinese


Change History

comment:1 Changed 2 years ago by xefbfbd@…

GB/T stands for "recommended national standard", it's not a mandatory one

see also: ticket:4020

comment:2 Changed 23 months ago by emmons

  • Status changed from new to accepted
  • Priority changed from assess to medium
  • Phase changed from dsub to rc
  • Milestone changed from UNSCH to 32
  • Owner changed from anybody to markus
  • type changed from unknown to data

comment:3 Changed 23 months ago by emmons

  • Cc fredrik, kristi, kiara added

comment:4 Changed 23 months ago by markus

  • Cc mark, pedberg, emmons, yoshito added
  • Weeks set to 0.1

This needs some discussion. How far do we want to go in matching this standard? Is it a good standard to follow at all?

The subject line suggests [reorder Latn Hani].

CLDR puts the native script first so that lists of names start with native-script ones followed by foreign-script ones. The use of pinyin in Chinese might justify putting Latn first, but this is a big change in how contacts apps sort.

The standard suggests [caseFirst upper][reorder Latn Kana] with Hani naturally going last, or possibly [caseFirst upper][reorder Latn Kana Grek Cyrl Hani]. It recommends this based on the binary order of GB 2312 which stems from character code allocation convenience and thus does not seem like a good basis for a real sort order.

We cannot (easily) put Arabic-script digits after ASCII digits because they are primary-equal. Does GB 2312 even have Arabic-script digits? It clearly doesn't before the ASCII letters. Does the Chinese text for "Ordinal Number - Arabic Numerals" mean something different?

I would be more comfortable looking at a typical, printed Chinese phone book that includes Latn names. Are they in the front or in the back?

Last edited 23 months ago by markus (previous) (diff)

comment:5 Changed 18 months ago by markus

  • Keywords punt32 added

comment:6 Changed 18 months ago by markus

  • Milestone changed from 32 to 33

comment:7 Changed 12 months ago by markus

  • Keywords punt33 added
  • Milestone changed from 33 to 34

comment:8 Changed 6 months ago by markus

  • Milestone changed from 34 to UNSCH

Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.