From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Jan 19 2008 - 17:48:54 CST
> De : unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] De la
> part de Rick McGowan
> Envoyé : samedi 19 janvier 2008 17:58
> À : unicode@unicode.org
> Objet : Unicode Transliteration Guidelines released
>
> The Unicode CLDR committee has released
> "Unicode Transliteration Guidelines":
> http://www.unicode.org/cldr/transliteration_guidelines.html
Note the following text:
[quote]
Even within particular languages, there can be variants according to
different authorities, or even varying across time (if the authority
changes its recommendation). The canonical identifier that CLDR uses
for these has the form:
source-target/variant
The source (and target) can be a language or script, either using
the
English name or a locale code. The variant should specify the
authority, and if necessary, the year. For example, the identifier
for
the Russian to Latin transliteration according to the UNGEGN would
be
ru-und_Latn/UNGEGN, or
Russian-Latin/UNGEGN
(...)
[/quote]
This description has a CLDR bug associated with it since quite long about
the format of the identifier. And proposed changes, plus comments,
suggesting that the use of '-' and '_' is not coherent with existing
practices with locale identifiers where they are treated equivalently.
Also the placement of the variant is ambiguous if the transliteration is
reversed.
This bug was accepted by a CLDR comity member but delayed for later
resolution. Apparently it is still in this status, and has been forgotten.
I have recently proposed a solution using another format, based on pure
locale ids (because transliteration variants are effectively creating locale
variants by defining an alternate orthography for the associated language):
ru.und-Latn-UNGEGN
und-Latn-UNGEGN.ru
And forgetting the support for languages using full names like:
Russian.Latin-UNGEGN
(because most of these names are not part of the CLDR Root locale and
English names for languages are often ambiguous or could create havoc with
some language names that must include the separators needed for parsing)
The format should then become simply:
<Source-locale-id>.<Target-locale-id>
where both locale ids are adhering to the RFC definition.
(Note that I suggest treating "." and "/" equivalently for the separator
between the two locales, like we should treat "_" and "-" equivalently as
tag separators within the locale id; this makes the format compatible with
existing locale id parsers, resource bundle parsers or resolvers where "/"
could cause problems with filesystems).
This archive was generated by hypermail 2.1.5 : Sat Jan 19 2008 - 22:32:02 CST