[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #10899(accepted unknown)

Opened 3 months ago

Last modified 6 weeks ago

Zawgyi to Unicode converter should delete duplicate combining marks

Reported by: ccornelius Owned by: ccornelius
Component: unknown Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:


With the Zawgyi font, many duplicate combining characters in the range 0x102b - 0x103e do not visually differ from a single combining character. However, the current converter my-t-my-s0-zawgyi.xml does not remove many of these duplicate combining marks, resulting in incorrect Unicode output.

Proposal 1: make sure that duplicates of these characters are removed in the final conversion:

102d-1030, 1032, 1033, 1035-1037, 1039, 103a, 103c, and 103d

Proposal 2: early in the conversion, remove duplicate characters that convert to U+103c. This must be done early because later rules move the last 103c to the right of the consonant. This results in an extra U+103c to the left of the consonant.

Fix: duplicates of characters in this range should be converted to a single U+103c in the first phase of conversion.

The tests data should also be updated with examples such as these Zawgyi inputs with duplicate combining marks:

ျျခ ႀႀက


Change History

comment:1 Changed 3 months ago by jungshik

  • Cc jungshik added

comment:2 Changed 6 weeks ago by ccornelius

  • Owner changed from anybody to ccornelius
  • Status changed from new to accepted

Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.