Right, Doug. I'll say a few more words.
In terms of language support, encoding of new characters in Unicode
benefits mostly digital heritage languages (via representation of historic
languages in Unicode, enabling preservation and scholarly work), although
there are some modern-use cases like Hanifi Rohingya. We do include digital
heritage under the umbrella of "digitally disadvantaged languages", but we
are not consistent in our terminology sometimes.
But encoding is just a first step. A vital first step, but just one step.
People tend to forget that adding new characters is just a part of what
Unicode does. For script support, it is just as important to have correct
Unicode algorithms and properties, such as correct values for the
Indic_Positional_Category
property (which together with the related work in with the Universal
Shaping Engine, allows for proper rendering of many languages). Behind the
scenes we have people like Ken and Laurentiu who have to dig through the
encoding proposals and fill in the many, many gaps to come up with
reasonable properties for such basic behavior as line-break.
As important as the work is on encoding, properties, and algorithms, when
we go up a level we get CLDR and ICU. Those have more impact on language
support for far more people in the world than the addition of new scripts
does. After all, approaching half of the population of the globe owns
smartphones: ICU provides programmatic access to the Unicode encoding,
properties, and algorithms, and CLDR + ICU together provide the core
language support on essentially every one of those smartphones.
But in terms of language coverage, the chart you reference (and the
corresponding
graph <http://cldr.unicode.org/index/downloads/cldr-32#TOC-Growth>) show
how very far CLDR still has to go. So we are gearing up for ways to extend
that graph: to move at least the basic coverage (the lower plateau in that
graph) to more languages, and to move basic-coverage languages up to more
in-depth coverage. We are focusing on ways to improve the CLDR survey tool
backend and frontend, since we know it currently cannot able to handle the
number of people that want to contribute, and has glitches in the UI that
make it clumsier to use than it should be.
Well, this turned out to be more than just a few words... sorry for going
on!
Mark
On Thu, Mar 1, 2018 at 9:10 PM, Doug Ewell via Unicode <unicode_at_unicode.org>
wrote:
> Tim Partridge wrote:
>
> > Perhaps the CLDR work the Consortium does is being referenced. That is
> > by language on this list
> > http://www.unicode.org/cldr/charts/32/supplemental/locale_
> coverage.html#ee
> > By the time it gets to the 100th entry the Modern percentage has "room
> > for improvement".
>
> I think that is a measurement of locale coverage -- whether the
> collation tables and translations of "a.m." and "p.m." and "a week ago
> Thursday" are correct and verified -- not character coverage.
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>
>
Received on Fri Mar 02 2018 - 07:30:51 CST
This archive was generated by hypermail 2.2.0 : Fri Mar 02 2018 - 07:30:51 CST