[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #6615(closed defect: fixed)

Opened 4 years ago

Last modified 3 years ago

Finnish collation problems

Reported by: markus Owned by: markus
Component: collation Data Locale: fi
Phase: rc Review: pedberg
Weeks: 0.2 Data Xpath:
Xref:

Description

a)
There is this comment:

<!-- For the phonebook collation, the only change is to collate v and w as separate characters, i.e.: -->

This is not true -- it would be true only if we used the <collation type="standard" alt="proposed" draft="unconfirmed"> order, but the <collation type="standard" > is quite different.

We should review the standard vs. proposed rules and adjust or remove the comment.

b)
The "phonebook" rules contain this comment:

# U+0291, LATIN SMALL LETTER Z WITH CURL, the last Z-like character.

but this character does not occur in any of the Finnish rules. We should remove this comment.

c)
The fi/standard rules include &D<<ð<<<Ð<<đ<<<Đ and the proposed and phonebook rules include &D\u0335<<đ<<<Đ, but the default order is already d<<d\u0335==đ. It seems to me like the Finnish rules are not minimal, and should be reviewed and simplified.

Note: Kent Karlsson wrote to me: "I would very much like to see all of the Nordic collations consolidated (and no separate "phonebook" collation for Finnish). See ticket:3059, now a bit dated."

Attachments

Change History

comment:1 Changed 4 years ago by markus

  • Cc erkki added

comment:2 Changed 4 years ago by emmons

  • Owner changed from anybody to markus
  • Priority changed from assess to medium
  • Status changed from new to assigned
  • Milestone changed from UNSCH to 25rc

comment:3 Changed 4 years ago by eik@…

Finnish no longer requires the phonebook ordering, since the standard computer ordering (SFS-EN 13710) now sorts v and w separately. If you want to maintain the combined sequence as an option, you could call it traditional. The ne standard assumes that the DUCET ordering is modified by the European Ordering Rules, for which, I understand, Marc Küster has prided the data. Thereafter, the Finnish/Swedish modification defined in Annex H of the Finnish standard (to be posted separately) hould be applied.
Regards, Erkki

comment:4 Changed 4 years ago by eik@…

(My apologies for the typos in the previous comment, still understandable, I hope.)

The following presents the additional modifications from the Annex H (in the ISO/IEC 14651 syntax):

% Suomen- ja ruotsinkielisen tekstin lajittelun lisämäärittely:

reorder-after <S00FE> % LATIN SMALL LETTER THORN

collating-symbol <S00E5> % LATIN SMALL LETTER A WITH RING ABOVE [PIENI RUOTSALAINEN O]
collating-symbol <S00E4> % LATIN SMALL LETTER A WITH DIAERESIS [PIENI Ä]
collating-symbol <S00F6> % LATIN SMALL LETTER O WITH DIAERESIS [PIENI Ö]
<S00E5>
<S00E4>
<S00F6>

reorder-end

reorder-after <SFFFF>

order_start forward;forward;forward;forward

% ü kuten y
<U00FC> <S0079>;"<BASE><VRNT2>";"<MIN><MIN>";<U00FC> % LATIN SMALL LETTER U WITH DIAERESIS [PIENI SAKSALAINEN Y]
<U00DC> <S0079>;"<BASE><VRNT2>";"<CAP><MIN>";<U00DC> % LATIN CAPITAL LETTER U WITH DIAERESIS [ISO SAKSALAINEN Y]

% å
<U00E5> <S00E5>;<BASE>;<MIN>;<U00E5> % LATIN SMALL LETTER A WITH RING ABOVE [PIENI RUOTSALAINEN O]
<U00C5> <S00E5>;<BASE>;<CAP>;<U00C5>% LATIN CAPITAL LETTER A WITH RING ABOVE [ISO RUOTSALAINEN O]
<U212B> <S00E5>;<BASE>;<CAP>;<U212B> % ANGSTROM SIGN [ÅNGSTRÖM-MERKKI]

% ä
<U00E4> <S00E4>;<BASE>;<MIN>;<U00E4> % LATIN SMALL LETTER A WITH DIAERESIS [PIENI Ä]
<U00C4> <S00E4>;<BASE>;<CAP>;<U00C4> % LATIN CAPITAL LETTER A WITH DIAERESIS [ISO Ä]
% æ kuten ä
<U00E6> <S00E4>;"<BASE><VRNT1>";"<MIN><MIN>";<U00E6> % LATIN SMALL LETTER Æ [PIENI TANSKALAINEN Ä]
<U00C6> <S00E4>;"<BASE><VRNT1>";"<CAP><MIN>";<U00C6> % LATIN CAPITAL LETTER Æ [ISO TANSKALAINEN Ä]

% ö
<U00F6> <S00F6>;<BASE>;<MIN>;<U00F6> % LATIN SMALL LETTER O WITH DIAERESIS [PIENI Ö]
<U00D6> <S00F6>;<BASE>;<CAP>;<U00D6> % LATIN CAPITAL LETTER O WITH DIAERESIS [ISO Ö]
% ø kuten ö
<U00F8> <S00F6>;"<BASE><VRNT1>";"<MIN><MIN>";<U00F8> % LATIN SMALL LETTER O WITH STROKE [PIENI ø]
<U00D8> <S00F6>;"<BASE><VRNT1>";"<CAP><MIN>";<U00D8> % LATIN CAPITAL LETTER O WITH STROKE [ISO ø]

reorder-end

comment:5 Changed 4 years ago by eik@…

Another comment:
There is also another Finnish ordering standard: SFS 4600, Order of characters and numerals, which is less computer-oriented and also defines rules for groupings and omissions, and which is not intended to be included in CLDR.
Regards, Erkki

comment:6 Changed 3 years ago by markus

  • Milestone changed from 25rc to 26rc

comment:7 Changed 3 years ago by eik@…

We should replace the "phonebook" ordering by the default (i.e., "standard"). The previous standard ordering should be named "traditional", since some users may still want to order v and w together.

If further input is required, the Kotoistus initiative is happy to provide it. We envisage a possible need to further updates after the publication (approval) of the new auxiliary character set. We believe that the consolidation proposed by Kent would be premature.

Please note also that the Finnish/Swedish refers specifically to Finnish and Swedish in Finland.

I agree that the rules could possibly be simplified; those quoted from the standard are direct quotations, though.

Sincerely, Erkki

comment:8 Changed 3 years ago by markus

  • Cc emmons, pedberg added
  • Status changed from assigned to reviewing
  • Review set to fredrik

comment:9 Changed 3 years ago by markus

  • Phase set to rc
  • Milestone changed from 26rc to 26

comment:10 Changed 3 years ago by fredrik

  • Review changed from fredrik to pedberg

comment:11 Changed 3 years ago by pedberg

  • Cc markus added
  • Status changed from reviewing to closed
  • Resolution set to fixed

If somebody requests collation=phonebook, it is no longer present so they will get the default standard ordering, which is what used to be phonebook, so the behavior should not change. So I guess it is OK to remove phonebook without an explicit alias.

View

Add a comment

Modify Ticket

Action
as closed
Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.