Re: Some much-needed improvements in JavaScript i18n from Norbert Lindenberg on 2012-12-20 (Unicode Mail List Archive)

From: Norbert Lindenberg <unicode_at_norbertlindenberg.com>
Date: Thu, 20 Dec 2012 19:22:19 -0800

I recommend that people interested in the ECMAScript Internationalization API read the actual standard or my introduction to it, and don't rely on Philippe's interpretation.

http://www.ecma-international.org/ecma-402/1.0/
http://norbertlindenberg.com/2012/12/ecmascript-internationalization-api/

Norbert

On Dec 20, 2012, at 11:55 , Philippe Verdy wrote:

> Great ! Now we have a formal definition to provide a compatibility javascript framework (which will also allow pluggable locales and extended collations to be supported).
>
> Beside the support for numbers and dates formaters (which is not so critical and easily implemented in Javascript, just like the locales resolvers for mapping a requested locale to an effective set of data localized in a smaller set of locales) the most important part of it is about collation.
>
> So the major improvement is in the String.localeCompare(that) method, which was defective since long (working only in a single unspecified default locale with a single unspecified default collator), to be replaced by Intl.Collator.prototype.compare(this, that[, requestedLocales[, options]])
>
> One major extension of this ECMA specification is how collators are instanciated and categorized : they depend not only the requestedLocales (i.e. here a BCP47 language code), and the options (containing the Unicode singleton-extension for BCP 47), but also on a new "matcher" option (taking values : "search", "lookup", and "best fit", but depending on sections using the collator object, only the first two values have a defined behavior for collation, the 2nd and 3rd values being used for numeric formaters).
>
>
> Also this specification changes from a numeric collation level to a "sensitivy" parameter taking one of the four values : "base","accent", "case", or "variant" ; it looks like this mixes the collation level parameter with the "kc" extension, whose mapping is not said to invalid in the spec but which is not specified elsewhere.
>
>
> On the 10 defined Unicode extension keys for specifying a collation within a locale identifier :
>
> - only the 3 following are supported and defined by the Intl.Collator object : "co" (for collation specialisations), "kn" (for sorting by numeric value), "kf" (for case first, such as sorting capital letters before small letters) For the "co" extension key (which specifies an additional list of collation options), the list of options must contain an initial null member (apparently for future extension, but absent from the Unicode locale extension "co"), and other listed members must not be "standard" and "search" (the ECMA spec does not really specify why they are reserved).
>
> - but the 5 following are not supported : "kb" (backward secondary weights reordering, mostly for French), "kh" (for hiragana quaternary), "kk" (for normalization type, is an NFD normalization step assumed, or the absence of a prior normalization step ?), "kr" (for reordering) and "vt" (for variable top) — See ECMA spec §10.2.3.
>
> - for the remaining 2 extensions, it is not clear if it is supported or not : "ka" (for alternate sorts such as dictionnary order vs. phonebook order, or for chinese and Japanese sort order variants based on transliterations or other transforms, or on radical/strokes count), "kc" (for case level).
>
> This specification also does not define an API to retrieve an exemplar set of collation elements (most often single letters) for indexing (according to the specified collator), or to compute and return this indexing value from a string (also according to a selected collator), though they are defined in the CLDR data (surrently only at the language level, not really at the locale level that defines a collation within a language).
>
> A major work will be to map the CLDR locales onto the Javascript/ECMAscript locales, using this API. There will be some ambiguities and I expect that even the ECMAscript specification and the CLDR specification will be both updated to allow their convergence. I just hope that the ECMAscript working group will participate to the discussions occuring in the CLDR project to solve their interoperability and enhance the I18n support in both environments (most other applications will benefit of it via their integration or support of the ICU API, or a similar API supporting the same options, such as the Windows API itself, or RDBMS features in an improved SQL API).
>
>
>
> 2012/12/20 Mark Davis ☕ <mark_at_macchiato.com>
> I have a new google blog post about the new ECMAScript (JavaScript) internationalization spec.
>
> “Until now, it has been very difficult for web application designers to do something as simple as sort names correctly according to the user's language. And it matters: English readers wouldn’t expect Århus to sort below Zürich, but Danish speakers would.” …
>
> http://googledevelopers.blogspot.com/2012/12/putting-zurich-before-arhus.html
>
> Many people contributed to this multi-year effort!
>
> Mark
>
> — Il meglio è l’inimico del bene —
>
>
Received on Thu Dec 20 2012 - 21:26:03 CST

This archive was generated by hypermail 2.2.0 : Thu Dec 20 2012 - 21:26:04 CST