[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #6326(accepted data)

Opened 4 years ago

Last modified 2 years ago

review almost-after-last-Latin collation tailorings

Reported by: markus Owned by: markus
Component: collation Data Locale:
Phase: rc Review:
Weeks: 0.4 Data Xpath:
Xref:

ticket:5549

ticket:2821

ticket:5710

Description

We should review all of the collation tailorings that appear to try to tailor near "the last character in the Latin script" or maybe "before the Latin clicks" or similar.

If there remain cases where we do want to tailor near "the last character in the Latin script", then we should add a CLDR release task, for after any Unicode release, to look for the currently-last one and see if the tailorings need to be updated to a newly-later-sorting character.

Reasons:

  1. They may need to be updated
  2. Because of them, we have wanted to add syntax like &[last Latn], see ticket:5549 and ticket:2821.

For example, sv.xml has &[before 1]ǀ<å<<<Å<ä ... which currently puts å etc. between U+0296 inverted glottal stop and U+01C0 dental click. There are currently 7 more Latin characters after the dental click.

It is not clear whether it really makes sense to tailor near "the last character in the Latin script no matter what is there". Why not tailor to a specific character? And why tailor primary-before something near the end of the Latin script, rather than after? Would we want this only for Latin, or also for other scripts?

Attachments

Change History

comment:1 Changed 4 years ago by kent.karlsson14@…

Well, for sv (and the other Nordic languages), my goal was to put åäö (and their respective level 2 variants) after all variants of z even if more variants of z were encoded, without the need for constant review. The variants of z are collated with a level 1 difference, not a level 2 difference, in DUCET. For Icelandic they should come after glottal stop (and its variants at level 1). So I put åäö (and their respective level 2 variants) among the glottal stops and clicks, which fulfill these requirements. Exactly where "near the end" should otherwise not matter very much, as click letters, glottal stop letters, (and now saltillio and and a few other Latin letters collated near the "end of Latin") aren't used for these languages.

comment:2 follow-up: ↓ 3 Changed 4 years ago by markus

da standard &[before 1]ǀ<æ<<<Æ<<ä<<<Ä<<ę<<<Ę<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<<œ<<<Œ<å<<<Å<<<aa<<<Aa<<<AA
da alt="proposed" &[before 1]ʒ<æ<<<Æ<<ä<<<Ä<<ę<<<Ę<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<<œ<<<Œ<å<<<Å<<<aa<<<aA<<<Aa<<<AA
fi standard &[before 1]ǀ<å<<<Å<ä<<<Ä<<æ<<<Æ<ö<<<Ö<<ø<<<Ø<<ő<<<Ő<<õ<<<Õ<<œ<<<Œ
fi alt="proposed", phonebook &[before 1]ǀ<å<<<Å<ä<<<Ä<<æ<<<Æ<ö<<<Ö<<ø<<<Ø
fo standard &[before 1]ǀ<æ<<<Æ<<ä<<<Ä<<ę<<<Ę<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<<œ<<<Œ<å<<<Å<<<aa<<<Aa<<<AA
fo alt="proposed" &[before 1]ǀ<æ<<<Æ<<ä<<<Ä<<ę<<<Ę<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<<œ<<<Œ<å<<<Å<<<aa<<<aA<<<Aa<<<AA
is standard &[before 1]ǀ<æ<<<Æ<<ä<<<Ä<ö<<<Ö<<ø<<<Ø<å<<<Å
kk standard &[before 1]ь<і<<<І
kl standard &[before 1]ǀ<æ<<<Æ<<ä<<<Ä<<ę<<<Ę<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<<œ<<<Œ<å<<<Å
nb standard &[before 1]ǀ<æ<<<Æ<<ä<<<Ä<<ę<<<Ę<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<<œ<<<Œ<å<<<Å<<aa<<<Aa<<<AA
nn standard &[before 1]ǀ<æ<<<Æ<<ä<<<Ä<<ę<<<Ę<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<<œ<<<Œ<å<<<Å<<aa<<<Aa<<<AA
se standard &[before 1]ǀ<ž<<<Ž<ø<<<Ø<<œ...
sv standard, reformed &[before 1]ǀ<å<<<Å<ä<<<Ä<<æ<<<Æ<<ę<<<Ę<ö<<<Ö<<ø<<<Ø<<ő<<<Ő<<œ<<<Œ<<ô<<<Ô

I probably missed some tailorings, especially for scripts other than Latin and Cyrillic.

comment:3 in reply to: ↑ 2 Changed 4 years ago by kent.karlsson14@…

There aren't (or really: should not be) as many as it seems in your list above. See ticket 3059.
da (327 bytes) - should be copy of nb
fo (327 bytes) - should be copy of nb
kl (327 bytes) - should be copy of nb
nn (327 bytes) - should be copy of nb
fi (327 bytes) - should be copy of sv

That leaves just 4 Latin ones (and one Cyrillic):
is (8.1 KB) -
nb (7.4 KB) -
se (9.2 KB) - "inbetween" sv and nb (exploiting a "trick")
sv (7.5 KB) -
kk (Cyrillic)

comment:4 Changed 4 years ago by emmons

  • Owner changed from anybody to markus
  • Priority changed from assess to medium
  • Status changed from new to assigned
  • Milestone changed from UNSCH to 24rc

comment:5 Changed 4 years ago by markus

  • Xref changed from 5549 2821 to 5549 2821 5710
  • Milestone changed from 24rc to 25dsub

Related to ticket:5710

comment:6 Changed 4 years ago by emmons

  • Milestone changed from 25dsub to 25M1

Moving all 25dsub to 25M1. Please adjust the milestone if you are not planning to complete the item in the 25M1 time frame.

comment:7 Changed 4 years ago by emmons

  • Milestone changed from 25M1 to 25rc

Moving all Markus's 25M1 to 25rc

comment:8 Changed 3 years ago by markus

  • Milestone changed from 25rc to 26rc

comment:9 Changed 3 years ago by markus

  • Milestone changed from 26rc to 27rc

comment:10 Changed 3 years ago by markus

  • Phase set to rc
  • Milestone changed from 27rc to 27

comment:11 Changed 2 years ago by markus

  • Milestone changed from 27 to 28

comment:12 Changed 2 years ago by markus

  • Type changed from task to data

comment:13 Changed 2 years ago by srl

  • Status changed from assigned to accepted

comment:14 Changed 2 years ago by markus

  • Milestone changed from 28 to UNSCH
View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.