[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #7864(closed: duplicate)

Opened 4 years ago

Last modified 4 years ago

transform ID source & target too loose & ambiguous

Reported by: markus Owned by: mark
Component: xxx-spec Data Locale:
Phase: rc Review:
Weeks: 0.2 Data Xpath:
Xref:

Description (last modified by markus) (diff)

http://www.unicode.org/reports/tr35/tr35-general.html#Transforms

The source and target can be locale IDs, Unicode script property aliases (long & short), "Any", and various other strings. This is too loose and ambiguous.

For example, ICU Transliterator calls UScript.getCode(String) which tries to guess whether the string is a script or a locale ID. For script names, we use the Unicode loose matching rules. For locale IDs, we create a ULocale, which is extremely lenient because the old syntax was very loose, and CLDR and ICU treat underscore and dash as equivalent.

Sample problem cases:

"New_Tai_Lue" matches a long script name (for Talu). As a ULocale of "new_TAI_LUE" it also contains a valid language subtag "new" which has the likely script Deva.

"ro_RO" matches a short script name (Roro=Rongorongo). As a ULocale it has a likely script of Latn.

Ideas

  • Extend the syntax with explicit additions to indicate the type, so that implementations need not guess if a string is a script or a locale ID or something else.
  • Forbid/deprecate long script names; only allow 4-letter script codes; don't match them loosely except case-insensitive (and maybe that only for the first letter?).
  • Limit locale IDs to only lang_script_region with valid subtags.

IcuBug:11171 is for fixing ICU.

Attachments

Change History

comment:1 Changed 4 years ago by markus

  • Description modified (diff)

comment:2 Changed 4 years ago by markus

Another example: "Pau_Cin_Hau" matches a long script name (=Pauc). As a ULocale of "pau_CIN_HAU" it contains the language subtag "pau" (Palauan) which has a likely script of Latn.

comment:3 Changed 4 years ago by markus

The "long" script name "Yi" (=Yiii) is the very same (except for customary titlecase) as the language subtag "yi" (=Yiddish, likely script Hebr). Peter noted this in IcuBug:11217

comment:4 Changed 4 years ago by pedberg

  • Owner changed from anybody to mark
  • Phase changed from final to rc
  • Priority changed from assess to major
  • Status changed from new to assigned
  • Milestone changed from UNSCH to 27

comment:5 Changed 4 years ago by pedberg

  • Status changed from assigned to design

comment:6 Changed 4 years ago by mark

  • Status changed from design to closed
  • Resolution set to duplicate
  • Milestone changed from 27 to 28

duplicate of ticket:6463

However, I don't see this getting done in 27.

View

Add a comment

Modify Ticket

Action
as closed
Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.