[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #10031(accepted tools)

Opened 14 months ago

Last modified 12 days ago

Change MyanmarZawgyiConverter to call CLDR transform

Reported by: sascha Owned by: ccornelius
Component: unknown Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:


Currently, the code in http://unicode.org/cldr/trac/browser/trunk/tools/java/org/unicode/cldr/util/MyanmarZawgyiConverter.java#L117 loads a custom zawgyiUnicodeTransliterator that pre-dates, and is different from, the CLDR transform my-t-my-s0-zawgyi. We should replace this by a call to the CLDR transform.

Last time I checked, ICU didn’t support BCP47-T http://bugs.icu-project.org/trac/ticket/12163. If that’s still the case, we’ll need to use the legacy identifier Zawgyi-my instead of the BCP47-T identifier my-t-my-s0-zawgyi. But that’s a minor detail.


Change History

comment:1 Changed 13 months ago by mark

  • Owner changed from anybody to sascha
  • Priority changed from assess to major
  • Type changed from unknown to tools
  • Status changed from new to accepted
  • Milestone changed from UNSCH to 32

comment:2 Changed 9 months ago by sascha

  • Owner changed from sascha to ccornelius

Craig, can you look into this?

Since ICU still does’t support BCP47-T, I tried to replace http://unicode.org/cldr/trac/browser/trunk/tools/java/org/unicode/cldr/util/MyanmarZawgyiConverter.java#L117 with the following:

     * Transliteration to convert Burmese text in Zawgyi-encoded string to
     * standard Unicode codepoints and ordering.
    // TODO(sascha): Use "my-t-my-s0-zawgyi" as soon as ICU supports BCP47-T.
    // http://bugs.icu-project.org/trac/ticket/12163
    static final Transform<String, String> zawgyiUnicodeTransliterator =
        Transliterator.getInstance("Zawgyi-my", Transliterator.FORWARD);

Which mostly worked, but when running ant -Drununittest.arg=-filter:TestMyanmarZawgyi unittest the following test in tools/cldr-unittest/src/org/unicode/cldr/unittest/TestDisplayAndInputProcessor.java is failing:

        String z_with_space = "\u0020\u102e\u0020\u1037\u0020\u1039"; // Test #5
        String u_with_space = "\u00a0\u102e\u00a0\u1037\u00a0\u103a";
        String converted_space = daip.processInput("", z_with_space, null);
        if (false && !converted_space.equals(u_with_space)) {
            errln("Myanmar with space incorrectly normalized:\n" + z_with_space
                + " to\n" + converted_space + '\n' + u_with_space);

I’m not sure if this test case could be removed, or if the Zawgyi-to-Unicode converter should handle it. In either case, you’ll be the best person to make the change. Many thanks in advance!

comment:3 Changed 6 months ago by mark

  • Milestone changed from 32 to 33

No response from Craig, so moving to next release.

comment:4 Changed 4 months ago by mark

  • Phase changed from dsub to rc

comment:5 Changed 4 weeks ago by mark

  • Phase changed from rc to spec-beta

comment:6 Changed 12 days ago by mark

  • Phase changed from spec-beta to dsub
  • Milestone changed from 33 to 34

Craig couldn't get to this in v32 or v33. One more release, then we'll just close it.


Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.