Re: Some much-needed improvements in JavaScript i18n

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Thu, 20 Dec 2012 20:55:56 +0100

Great ! Now we have a formal definition to provide a compatibility
javascript framework (which will also allow pluggable locales and extended
collations to be supported).

Beside the support for numbers and dates formaters (which is not so
critical and easily implemented in Javascript, just like the locales
resolvers for mapping a requested locale to an effective set of data
localized in a smaller set of locales) the most important part of it is
about collation.

So the major improvement is in the String.localeCompare(that) method, which
was defective since long (working only in a single unspecified default
locale with a single unspecified default collator), to be replaced by
Intl.Collator.prototype.compare(this, that[, requestedLocales[, options]])

One major extension of this ECMA specification is how collators are
instanciated and categorized : they depend not only the requestedLocales
(i.e. here a BCP47 language code), and the options (containing the Unicode
singleton-extension for BCP 47), but also on a new "matcher" option (taking
values : "search", "lookup", and "best fit", but depending on sections
using the collator object, only the first two values have a defined
behavior for collation, the 2nd and 3rd values being used for numeric
formaters).

Also this specification changes from a numeric collation level to a
"sensitivy" parameter taking one of the four values : "base","accent",
"case", or "variant" ; it looks like this mixes the collation level
parameter with the "kc" extension, whose mapping is not said to invalid in
the spec but which is not specified elsewhere.

On the 10 defined Unicode extension keys for specifying a collation within
a locale identifier :

- only the 3 following are supported and defined by the Intl.Collator
object : "co" (for collation specialisations), "kn" (for sorting by numeric
value), "kf" (for case first, such as sorting capital letters before small
letters) For the "co" extension key (which specifies an additional list of
collation options), the list of options must contain an initial null member
(apparently for future extension, but absent from the Unicode locale
extension "co"), and other listed members must not be "standard" and
"search" (the ECMA spec does not really specify why they are reserved).

- but the 5 following are not supported : "kb" (backward secondary weights
reordering, mostly for French), "kh" (for hiragana quaternary), "kk" (for
normalization type, is an NFD normalization step assumed, or the absence of
a prior normalization step ?), "kr" (for reordering) and "vt" (for variable
top) — See ECMA spec §10.2.3.

- for the remaining 2 extensions, it is not clear if it is supported or not
: "ka" (for alternate sorts such as dictionnary order vs. phonebook order,
or for chinese and Japanese sort order variants based on transliterations
or other transforms, or on radical/strokes count), "kc" (for case level).

This specification also does not define an API to retrieve an exemplar set
of collation elements (most often single letters) for indexing (according
to the specified collator), or to compute and return this indexing value
from a string (also according to a selected collator), though they are
defined in the CLDR data (surrently only at the language level, not really
at the locale level that defines a collation within a language).

A major work will be to map the CLDR locales onto the Javascript/ECMAscript
locales, using this API. There will be some ambiguities and I expect that
even the ECMAscript specification and the CLDR specification will be both
updated to allow their convergence. I just hope that the ECMAscript working
group will participate to the discussions occuring in the CLDR project to
solve their interoperability and enhance the I18n support in both
environments (most other applications will benefit of it via their
integration or support of the ICU API, or a similar API supporting the same
options, such as the Windows API itself, or RDBMS features in an improved
SQL API).

2012/12/20 Mark Davis ☕ <mark_at_macchiato.com>

> I have a new google blog post about the new ECMAScript (JavaScript)
> internationalization spec.
>
> “Until now, it has been very difficult for web application designers to do
> something as simple as sort names correctly according to the user's
> language. And it matters: English readers wouldn’t expect Århus to sort
> below Zürich, but Danish speakers would.” …
>
>
> http://googledevelopers.blogspot.com/2012/12/putting-zurich-before-arhus.html
>
> Many people contributed to this multi-year effort!
>
> Mark <https://plus.google.com/114199149796022210033>
> *
> *
> *— Il meglio è l’inimico del bene —*
> **
>
>
Received on Thu Dec 20 2012 - 13:58:00 CST

This archive was generated by hypermail 2.2.0 : Thu Dec 20 2012 - 13:58:01 CST