[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #6463(closed: fixed)

Opened 6 years ago

Last modified 3 years ago

Design issue: converting transform names to BCP47 tags

Reported by: mark Owned by: sascha
Component: transliteration Data Locale:
Phase: dsub Review: mark
Weeks: Data Xpath:
Xref:

ticket:9155

ticket:9106

ticket:9100

ticket:9089

ticket:9088

ticket:6449

ticket:5384

ticket:3976

ticket:3012

ticket:1112

Description (last modified by mark) (diff)

I was looking at how we could convert our transform names into BCP47 tags, and ran into an issue.

The Good

Most transform names work fine:

Arabic-Latin => und-Latn-t-und-Arab
cs-cs_FONIPA => cs-fonipa-t-cs

(The order gets reversed, of course)

Certain variants work fine, and are defined.

/BGN => -m0-bgn
/UNGEGN => -m0-ungegn

​Others variants can use the private use transform variant

/Unicode => -x0-unicode

The Bad

Where the target or source can't be expressed as a language tag. There are a few cases, and I'll suggest possible approaches.

​A. There 'oughta-be-a' script​. Small number of cases.

Examples: [Jamo-Latin, Latin-Jamo]

Just define and use a private use Script code, so:

Solution:
und-Qaaj-t-und-Latn
und-Latn-t-und-Qaaj

  1. There is a script/language, but our name does't use it.

We have a misspelling, or the language/script name is not the same as what is in CLDR

Examples: XXX-JapaneseKana

Just fix:

Solution:
und-Kana-t-xxx

  1. Simple Target

Examples: Any-​Lower​

Encode as a private-use transform variant and target 'und'

Solution:
und-t-x0-lower

  1. Reversible (but probably shouldn't be)

Examples: [Fullwidth-Halfwidth, Halfwidth-Fullwidth]


We normally don't have pairs for similar transforms. Eg, we have Any-Lower and Any-Upper, not Lower-Upper. So treat these similarly to C.

Solution:
und-t-x0-fullwide
und-t-x0-halfwide

  1. Reversable

Examples: Hex-Any, Hex-Any/C, Hex-Any/Java, Hex-Any/Perl, Hex-Any/Unicode, Hex-Any/XML, Hex-Any/XML10, Any-Hex, Any-Hex/C, Any-Hex/Java, Any-Hex/Perl, Any-Hex/Unicode, Any-Hex/XML, Any-Hex/XML10, !Any-Hex/Plain,.. These also have a bunch of variants​.

I couldn't think of any nice way to do this, so the choices feel like least-of-evils. I'd favor the first as being clearer, but mention the others for comparison.

  1. Solution: use -x0 and -x0-from

und-t-x0-hexesc = Any-Hex -- like the Any-Lower above
und-t-x0-from-hexesc = Hex-Any

-- note that after x0, we can have any tags > 2 alphanum

  1. Solution: overload 'mul' to mean a reversal

und-t-x0-xhexesc = Any-Hex -- like the Any-Lower above
mul-t-x0-xhexesc = Hex-Any

  1. Solution: use private-use language codes

qah = Hex (for CLDR), so

qah-t-und = Any-Hex
und-t-qah = Hex-Any

For any of the above, we can have an extra x0 subtag for the variant, eg

und-t-x0-hexesc-unicode = Any-Hex/Unicode
und-t-x0-from-hexesc-unicode = Hex-Any/Unicode


What is really unfortunate is that when we did BCP47, we didn't allow for private-use variants; we only have private-use language/script/regions. It would have been *much* easier if, say, any variant subtag starting with x was private use. Then we could consistently represent "Hex" by "und-xhexesc" (extra 'esc' added just to get to > 4 letters):

und-xhexesc-t-und = Any-Hex
und-t-und-xhexesc = Hex-Any

Attachments

Change History

comment:1 Changed 6 years ago by mark

  • Priority changed from assess to major
  • Component changed from unknown to design
  • Milestone changed from UNSCH to 25dsub

comment:2 Changed 6 years ago by mark

  • Owner changed from anybody to mark
  • Status changed from new to assigned

comment:3 Changed 6 years ago by pedberg

  • Cc pedberg added

comment:4 Changed 6 years ago by mark

  • Description modified (diff)

comment:5 Changed 6 years ago by mark

  • Xref changed from 6449 to 6449 5384

comment:6 Changed 6 years ago by mark

  • Xref changed from 6449 5384 to 6449 5384 3012 1112

comment:7 Changed 5 years ago by emmons

  • Milestone changed from 25dsub to 25rc

Moving all 25dsub and 25design tickets to 25rc. If you plan to complete items in the 25M1 time frame, please move those tickets to 25M1.

comment:8 Changed 5 years ago by mark

  • Component changed from design to data-translit

comment:9 Changed 5 years ago by mark

  • Milestone changed from 25rc to 26rc

comment:10 Changed 5 years ago by mark

  • Milestone changed from 26rc to 27dsub

comment:11 Changed 5 years ago by markus

  • Phase set to dsub
  • Milestone changed from 27dsub to 27

comment:12 Changed 4 years ago by mark

  • Milestone changed from 27 to 28

comment:13 Changed 4 years ago by markus

  • type changed from unknown to data

comment:14 Changed 4 years ago by srl

  • Status changed from assigned to accepted

comment:15 Changed 4 years ago by mark

  • Owner changed from mark to sascha

Sascha, transferring to you. Don't have to do them all at once!, and can transfer back ones that are unclear. The design ones need to go back to committee with design, and the assess ones report back on the priority. Some older bugs might be obsolete.

comment:16 Changed 4 years ago by emmons

  • Milestone changed from 28 to 28roll

Moving all outstanding 28 tickets to 28roll. We will discuss disposition of these at the next CLDR TC.

comment:17 Changed 3 years ago by sascha

  • Xref changed from 6449 5384 3012 1112 to 9088 6449 5384 3012 1112

comment:18 Changed 3 years ago by sascha

  • Xref changed from 9088 6449 5384 3012 1112 to 9089 9088 6449 5384 3012 1112

comment:19 Changed 3 years ago by sascha

  • Xref changed from 9089 9088 6449 5384 3012 1112 to 9089 9088 6449 5384 3976 3012 1112

comment:20 Changed 3 years ago by sascha

  • Xref changed from 9089 9088 6449 5384 3976 3012 1112 to 9106 9089 9088 6449 5384 3976 3012 1112

comment:21 Changed 3 years ago by sascha

  • Xref changed from 9106 9089 9088 6449 5384 3976 3012 1112 to 9155 9106 9089 9088 6449 5384 3976 3012 1112

comment:22 Changed 3 years ago by sascha

  • Xref changed from 9155 9106 9089 9088 6449 5384 3976 3012 1112 to 9155 9106 9100 9089 9088 6449 5384 3976 3012 1112

comment:23 Changed 3 years ago by mark

Should be combined with ticket:9100, which covers much of the same ground but has additional info.

comment:24 Changed 3 years ago by sascha

  • Status changed from accepted to reviewing
  • Review set to mark
  • Milestone changed from upcoming to 29

comment:25 Changed 3 years ago by mark

  • Status changed from reviewing to closed
  • Resolution set to fixed
View

Add a comment

Modify Ticket

Action
as closed
Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.