L2/04-031
Re: | UCA Revised Latin? |
---|---|
From: | Mark Davis |
Date: | 2004-01-23 |
We should consider whether or not to do the following changes to the next version of the UCA.
[For the meeting, please also print http://www.unicode.org/charts/collation/chart_Latin.html]
1. Make alternate forms of letters (like the following) be secondary differences from the 'base' letter.
a | ɐ 0250 |
ɑ 0251 |
ɒ 0252 |
||||||||
b | ʙ 0299 |
ƀ 0180 |
ɓ 0253 |
Ɓ 0181 |
ƃ 0183 |
Ƃ 0182 |
|||||
c | ƈ 0188 |
Ƈ 0187 |
ɕ 0255 |
||||||||
d | đ 0111 |
Đ 0110 |
ɖ 0256 |
Ɖ 0189 |
ɗ 0257 |
Ɗ 018A |
ƌ 018C |
Ƌ 018B |
ð 00F0 |
Ð 00D0 |
ƍ 018D |
etc. |
Outliers: the following appear unrelated to the 'base' letter that they are after (in UCA order), so should be left where they are.
Ƣ 01A2 |
ƣ 01A3 |
ɤ 0264 |
etc. |
2. Make "æ" be a secondary difference from "ae".
For reference, here is an email related to the topic.
> ----- Original Message -----
> From: Åke Persson
> To: Mark Davis
> Sent: Wed, 2003 Dec 31 06:36
> Subject: ae << æ etc.
>
> Mark,
>
> I have browsed the latest ICU collations. Here are a few comments.
>
> The inclusion of ae << æ in several languages resembles my experience when I
> implemented the UCA in Mimer SQL. The next thing that came up was letters with
> stroke. For example, the Polish letter L-stroke, properly used in Polish names,
> did not match a Swedish or English search for names containing L. L-stoke is
> expected to be L with a stroke "accent", except for Polish (and Sorbian).
> <<Lodz.jpg>> is a snapshot from a Swedish encyclopædia (note also "oe"). To make
> a long story short, it all ended up in the European Ordering Rules (EOR)
> concept, where the base letters in the latin alphabet are only A-Z. The first
> step was to create an EOR-tailoring as the base. Languages, with additional
> letters in their alphabet, was tailored on top of the EOR tailoring. The next
> step was improvement of space and performance, by making EOR the default, and to
> create a tailoring for the default UCA instead (at least needed for the
> conformance test).
>
> Here's an overview of the tailorings:
> http://developer.mimer.com/collations/charts/tailorings.htm
>
> Please, take a closer look at:
> Catalan, Croatian, Faroese, Icelandic, Latvian, Lithuanian, Romanian, and Slovak
> compared to the corresponding ICU collations.
>
> My sources are documented here:
> http://developer.mimer.com/collations/charts/sources.htm
>
> The E-ogonek (old Sami and Icelandic Ä) as a variant of Ä in Faroese, Finnish,
> Greenlandic, Norwegian, and Swedish looks a bit goofy. I would rather expect a
> search match for E in Polish and Lithuanian names containing E-ogonek. I think
> it's better to have a specific locale for Sami.
>
> [before 1] is used extensivly in the ICU collations. It's easier to read the
> collation definitions, if [before 1] is used only when necessary.
>
> Happy New Year!
> Åke Persson