[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #10123(new data)

Opened 12 days ago

Latin-ASCII should remove Mn marks on digits too

Reported by: pedberg Owned by: anybody
Component: translit Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:


Many people use a compound transform like "Any-Latin; Latin-ASCII" to produce the best ASCII-range (or mostly ASCII-range) equivalent for arbitrary Unicode text.

Currently, to enable-round-trip mapping, the Arabic-Latin transform maps Persian digits 06F0-06F9 to a combination of the 0-9 ASCII equivalent plus COMBINING MACRON BELOW (to distinguish them from the mapping of 0660-0669 digits). However Latin-ASCII does not remove COMBINING MACRON BELOW following digits, so Persian digits run through "Any-Latin; Latin-ASCII end up with COMBINING MACRON BELOW instead of as plain ASCII-range digits.

This is due to the following line in Latin-ASCII:

[:Latin:] { [:Mn:]+ → ; # maps to nothing; remove all Mn following Latin letter

That should be generalized at least to allow stripping from digits too:

[[:Latin:][0-9]] { 



Add a comment

Modify Ticket

as new

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.