[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #10899(accepted data)

Opened 7 months ago

Last modified 5 days ago

Zawgyi to Unicode converter should delete duplicate combining marks

Reported by: ccornelius Owned by: ccornelius
Component: translit Data Locale:
Phase: spec-beta Review:
Weeks: Data Xpath:
Xref:

Description

With the Zawgyi font, many duplicate combining characters in the range 0x102b - 0x103e do not visually differ from a single combining character. However, the current converter my-t-my-s0-zawgyi.xml does not remove many of these duplicate combining marks, resulting in incorrect Unicode output.

Proposal 1: make sure that duplicates of these characters are removed in the final conversion:

102d-1030, 1032, 1033, 1035-1037, 1039, 103a, 103c, and 103d

Proposal 2: early in the conversion, remove duplicate characters that convert to U+103c. This must be done early because later rules move the last 103c to the right of the consonant. This results in an extra U+103c to the left of the consonant.

Fix: duplicates of characters in this range should be converted to a single U+103c in the first phase of conversion.

The tests data should also be updated with examples such as these Zawgyi inputs with duplicate combining marks:

ျျမန္​မာကာ
အခ်စ္႔႔သီေသာ
ျျခ ႀႀက

Attachments

my-t-my-s0-zawgyi.txt (8.2 KB) - added by ccornelius@… 2 weeks ago.
Updated ICU transliteration rules for Zawgyi to Unicode. Source is https://github.com/googlei18n/myanmar-tools/blob/master/genconvert/input/my-t-my-s0-zawgyi.txt

Change History

comment:1 Changed 7 months ago by jungshik

  • Cc jungshik added

comment:2 Changed 5 months ago by ccornelius

  • Owner changed from anybody to ccornelius
  • Status changed from new to accepted

Changed 2 weeks ago by ccornelius@…

Updated ICU transliteration rules for Zawgyi to Unicode. Source is https://github.com/googlei18n/myanmar-tools/blob/master/genconvert/input/my-t-my-s0-zawgyi.txt

comment:3 Changed 2 weeks ago by ccornelius@…

The updated file just attached includes numerous fixes for conversion from Zawgyi font encoding to Unicode Burmese.

The source of this is https://github.com/googlei18n/myanmar-tools/blob/master/genconvert/input/my-t-my-s0-zawgyi.txt

comment:4 Changed 5 days ago by ccornelius

  • Phase changed from dsub to spec-beta
  • Type changed from unknown to data
  • Component changed from unknown to translit
View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.