[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #10123(accepted data)

Opened 5 months ago

Last modified 4 months ago

Latin-ASCII should remove Mn marks on digits too

Reported by: pedberg Owned by: pedberg
Component: translit Data Locale:
Phase: rc Review:
Weeks: Data Xpath:


Many people use a compound transform like "Any-Latin; Latin-ASCII" to produce the best ASCII-range (or mostly ASCII-range) equivalent for arbitrary Unicode text.

Currently, to enable-round-trip mapping, the Arabic-Latin transform maps Persian digits 06F0-06F9 to a combination of the 0-9 ASCII equivalent plus COMBINING MACRON BELOW (to distinguish them from the mapping of 0660-0669 digits). However Latin-ASCII does not remove COMBINING MACRON BELOW following digits, so Persian digits run through "Any-Latin; Latin-ASCII end up with COMBINING MACRON BELOW instead of as plain ASCII-range digits.

This is due to the following line in Latin-ASCII:

[:Latin:] { [:Mn:]+ → ; # maps to nothing; remove all Mn following Latin letter

That should be generalized at least to allow stripping from digits too:

[[:Latin:][0-9]] { 


Change History

comment:1 Changed 4 months ago by emmons

  • Owner changed from anybody to pedberg
  • Phase changed from dsub to rc
  • Priority changed from assess to minor
  • Status changed from new to accepted
  • Milestone changed from UNSCH to 32

Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.