[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #6691(accepted unittest)

Opened 4 years ago

Last modified 18 months ago

Fix mismatch in transliterators vs NFD/NFC

Reported by: mark Owned by: pedberg
Component: translit Data Locale:
Phase: rc Review:
Weeks: Data Xpath:


If a transliterator is written like the following it will fail, because by the time the ü is reached, the source is in NFD.

:: NFD;
ü > x;

The ü rule would have to be:

u \u0038 > x;

To fix this, either

  1. add tests to verify that this doesn't happen (eg that the rules match the operand normalization form, or
  2. add a 'normalizing' tool to ensure that the rules are correct.


Change History

comment:1 Changed 4 years ago by emmons

  • Status changed from new to assigned
  • Component changed from unknown to data-supplemental
  • Priority changed from assess to medium
  • Milestone changed from UNSCH to 25rc
  • Owner changed from anybody to pedberg
  • Type changed from unknown to enhancement

comment:2 Changed 3 years ago by pedberg

  • Milestone changed from 25rc to 26rc

comment:3 Changed 3 years ago by pedberg

  • Component changed from data-supplemental to test

comment:4 Changed 3 years ago by pedberg

  • Cc mark added
  • Milestone changed from 26rc to 27rc

There is a third option for how to fix this, which is to eliminate the initial :: NFD; or :: NFD (NFC); rule.

This might be the best approach for several of the Cyrillic script/language -> Latin transforms, which often have :: NFD (NFC); at the beginning but then have rules for Й and й which in NFD are e.g. И + /u0306 etc. It is not clear that the NFD is needed for anything in the transform.

Need some discussion on this.

comment:5 Changed 3 years ago by mark

The advantage of doing NFD is that if you have an odd accent, it gets pulled out, the base character gets converted, and then the accent applies to the new base in the new script.

We should have a test that

If :: NFD occurs at the top, that all the right sides are in NFD with > or <> rules

If :: .. (NFD) occurs at the bottom, then all the left sides are in NFC for < or <> rules.

And the same for the other forms: NFC, NFKC, NFKD.

The API gives a way to walk through the rules, so the files don't have to be parsed by hand to do this.

comment:6 Changed 3 years ago by markus

  • Phase set to rc
  • Milestone changed from 27rc to 27

comment:7 Changed 2 years ago by pedberg

  • Milestone changed from 27 to 28

comment:8 Changed 2 years ago by markus

  • Type changed from enhancement to unittest
  • Component changed from test to unknown

comment:9 Changed 2 years ago by srl

  • Status changed from assigned to accepted

comment:10 Changed 20 months ago by emmons

  • Component changed from unknown to translit

comment:11 Changed 19 months ago by pedberg

  • Milestone changed from 28 to 29

Out of time, look at early in 29 if possible

comment:12 Changed 18 months ago by emmons

  • Milestone changed from 29 to upcoming

Auto move of all 29 -> upcoming


Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.