[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #10123(accepted data)

Opened 4 months ago

Last modified 3 months ago

Latin-ASCII should remove Mn marks on digits too

Reported by: pedberg Owned by: pedberg
Component: translit Data Locale:
Phase: rc Review:
Weeks: Data Xpath:
Xref:

Description

Many people use a compound transform like "Any-Latin; Latin-ASCII" to produce the best ASCII-range (or mostly ASCII-range) equivalent for arbitrary Unicode text.

Currently, to enable-round-trip mapping, the Arabic-Latin transform maps Persian digits 06F0-06F9 to a combination of the 0-9 ASCII equivalent plus COMBINING MACRON BELOW (to distinguish them from the mapping of 0660-0669 digits). However Latin-ASCII does not remove COMBINING MACRON BELOW following digits, so Persian digits run through "Any-Latin; Latin-ASCII end up with COMBINING MACRON BELOW instead of as plain ASCII-range digits.

This is due to the following line in Latin-ASCII:

[:Latin:] { [:Mn:]+ → ; # maps to nothing; remove all Mn following Latin letter

That should be generalized at least to allow stripping from digits too:

[[:Latin:][0-9]] { 

Attachments

Change History

comment:1 Changed 3 months ago by emmons

  • Owner changed from anybody to pedberg
  • Phase changed from dsub to rc
  • Priority changed from assess to minor
  • Status changed from new to accepted
  • Milestone changed from UNSCH to 32
View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.