From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Jul 18 2003 - 06:16:42 EDT
On Friday, July 18, 2003 7:36 AM, Michael Everson <everson@evertype.com> wrote:
> At 00:57 +0200 2003-07-18, Philippe Verdy wrote:
>
> > Why is row 03 so resticted? Shouldn't it include those accents and
> > diacritics that are used by other characters once canonically
> > decomposed? Or does it imply that MES-2 is only supposed to use
> > strings if NFC form?
> >
> > Also, is this list under full closure with existing character
> > properties, like NFKD decompositions, and case mappings?
>
> The MES-2 is what it is, and was developed at the time when it was.
> It is thought to be a minumum requirement for European requirements,
> and is certainly a lot better than that old Adobe glyph list that was
> supported earlier on. It doesn't depend on very smart fonts.
>
> Personally I prefer the Multilingual European Subset.
Is there some work at CEN to align its MES-2 subset into a
revized (MES-2.1 ???) which not only takes into consideration the
ISO10646 reference but also its Unicode properties to make this set
self-closed, and actually implementable, at least with NFC closure
and case-mappings closure?
Support for NFKC closure should then be added in a next step, which
could optionally specify support for the corresponding decompositions
(but this would include combining characters, and would extend the
number of precomposed characters in NFC form to include in the
repertoire).
I don't think it's up to Unicode to do this work, but CEN should be
contacted to perform this job, or some vendor or open-sourcers
may have done it and published it.
I still note that modern Hebrew and Arabic are excluded from MES-2,
as they are not used in any official language in the European Union
or EFTA, or future EU candidates. But They are certainly of great
interest for countries with which the EU is a major partner, and which
are using these scripts. In some future, it would be needed to include
support for modern Georgian (a subset of U+10A0..U+10FF), and modern
Armenian (a subset of U+0530..U+058F), as well as some characters
from Cyrillic Supplementary (in U+0500..U+052F).
On the opposite, I don't understand why MES-2 included characters
in row U+25xx (Box Drawing, Block Elements, Geometric Shapes),
which are not strictly needed for text purpose (notably legal publications
of the E.U., which should better use markup systems), and the two
Alphabetic Presentation Forms U+FB01..U+FB02 (<fi> and <fl>
ligatures) which are really unneeded, even for legal purposes, or they
should have been coherent and included <ff>, <ffi>, <ffl> ligatures...
I suppose that this may come from widely used legacy encodings in
some EU+EFTA+European Council countries, but CEN should have
avoided them (they could still be selected by font renderers, if available
in fonts).
-- Philippe. Spams non tolérés: tout message non sollicité sera rapporté à vos fournisseurs de services Internet.
This archive was generated by hypermail 2.1.5 : Fri Jul 18 2003 - 07:00:52 EDT