[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #10962(new data)

Opened 5 months ago

Transform InterIndic not all converted, need documentation

Reported by: pedberg Owned by: anybody
Component: translit Data Locale:
Phase: rc Review:
Weeks: Data Xpath:


From ICU http://bugs.icu-project.org/trac/ticket/13610:

When try to perform transliteration from Gurmukhi to Arabic using the parameter "Gurmukhi-Arabic", I get the result which contains "U+E07C" Unicode character, which belongs to the Unicode "Private Use Area". How to reproduce the issue: try to transliterate "ਸੰਯੁਕਤ ਰਾਜ ਅਮਰੀਕਾ" from Gurmukhi to Arabic (actually it's "USA" in Punjabi, taken from OpenStreetMap, see name:pa at ​https://nominatim.openstreetmap.org/details.php?place_id=177579678). ICU 60.2 PHP 7.0.25-0ubuntu0.16.04.1 Intl 1.1.0

It's the same for Urdu, not just for Arabic: try "Guru-ur".

The use of PUA for a common InterIndic intermediate encoding is intentional but should only be transient and internal. There are two issues here:

  • It seems that some transforms are incomplete and allow the InterIndic PUA codes to leak out.
  • The use of PUA for this purpose is not documented, it probably should be somewhere.



Add a comment

Modify Ticket

as new

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.