[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #10797(accepted unknown)

Opened 8 months ago

Last modified 8 months ago

hsb (Upper Sorbian): Inconsistency between collation rules and <exemplarCharacters type="index">

Reported by: maiku.fabian@… Owned by: mark
Component: unknown Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:
Xref:

Description

https://unicode.org/cldr/trac/browser/trunk/common/collation/hsb.xml

contains:

&C<č<<<Č<ć<<<Ć
&E<ě<<<Ě
&H<ch<<<cH<<<Ch<<<CH
&[before 1] L<ł<<<Ł
&R<ř<<<Ř
&S<š<<<Š
&Z<ž<<<Ž<ź<<<Ź

https://unicode.org/cldr/trac/browser/trunk/common/main/hsb.xml

contains:

<exemplarCharacters type="index">[A B C Č Ć D {DŹ} E F G H {CH} I J K Ł L M N O P Q R S Š T U V W X Y Z Ž]</exemplarCharacters>

There is some inconsistency here, in the index DŹ is a separate character but
in the sorting rules it does not occur, i.e. it does not have
a primary difference to D. So how could words starting with DŹ be sorted
into their own index bucket then? I guess collation rules for DŹ
like

&D<dź<<<dŹ<<<Dź<<<DŹ

are missing.

Also, the collation rules treat Ě, Ř and Ź as a separate
characters but they do not occur as index characters.
I guess these should be added to the list of index characters.

Attachments

Change History

comment:1 follow-up: ↓ 2 Changed 8 months ago by mark

  • Owner changed from anybody to mark
  • Status changed from new to accepted

Looks like DZ is an issue:

Ok to group items together that have primary differences, but probably doesn't work to separate items without primary differences, like Dź vs Dz

comment:2 in reply to: ↑ 1 Changed 8 months ago by maiku.fabian@…

Replying to mark:

Looks like DZ is an issue:

Ok to group items together that have primary differences, but probably doesn't work to separate items without primary differences, like Dź vs Dz

I am not sure I understand that. Adding collation rules
like

&D<dź<<<dŹ<<<Dź<<<DŹ

would make Dź primary different from Dz. It would sort like this
then:

D

DZZ

And then it would be OK to have an index bucket {DŹ}.
Just like ch, for ch we have

&H<ch<<<cH<<<Ch<<<CH

and there is the index bucket {CH} after H. So this is consistent.

If it is not correct to treat DŹ as a primary different character to sort
after D, then it should not have its own bucket in the index.

So either the {DŹ} bucket should be removed *or* the

&D<dź<<<dŹ<<<Dź<<<DŹ

rules should be added.

View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.