[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #11449(accepted)

Opened 7 weeks ago

Last modified 7 weeks ago

Test for PU characters in all transforms that pivot through Interindic.

Reported by: mark Owned by: mark
Component: other-unittest Data Locale:
Phase: rc Review:
Weeks: Data Xpath:
Xref:

Description

(Split from ticket:10962)

From ICU http://bugs.icu-project.org/trac/ticket/13610:


When try to perform transliteration from Gurmukhi to Arabic using the parameter "Gurmukhi-Arabic", I get the result which contains "U+E07C" Unicode character, which belongs to the Unicode "Private Use Area". How to reproduce the issue: try to transliterate "ਸੰਯੁਕਤ ਰਾਜ ਅਮਰੀਕਾ" from Gurmukhi to Arabic (actually it's "USA" in Punjabi, taken from OpenStreetMap, see name:pa at ​https://nominatim.openstreetmap.org/details.php?place_id=177579678). ICU 60.2 PHP 7.0.25-0ubuntu0.16.04.1 Intl 1.1.0

It's the same for Urdu, not just for Arabic: try "Guru-ur".


The use of PUA for a common InterIndic intermediate encoding is intentional but should only be transient and internal. There are two issues here:

  • It seems that some transforms are incomplete and allow the InterIndic PUA codes to leak out.

  1. Add a test that all transforms that pivot through Interindic for PU characters completely convert all the PU characters.
  2. Fix any transforms that fail the test.

Attachments

Change History

comment:1 Changed 7 weeks ago by mark

  • Status changed from new to accepted
  • Component changed from unknown to translit
  • Priority changed from assess to major
  • Phase changed from dsub to rc
  • Milestone changed from UNSCH to 35
  • Owner changed from anybody to mark
  • type changed from unknown to unittest

comment:2 Changed 4 weeks ago by mark

  • Component changed from transliteration to other-unittest

comment:3 Changed 4 weeks ago by mark

  • Milestone changed from 35 to 35-optional
View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.