From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Nov 14 2003 - 11:40:05 EST
From: "Kent Karlsson" <kentk@cs.chalmers.se>
> Philippe Verdy wrote:
> > (1) a singleton (example the Angström symbol, canonically
> > mapped to A with diaeresis,
> The Ångström (note spelling) sign is canonically mapped to
> capital a with ring.
Beside the speeling (is it wrong to omit the ring in English?) that I
don't have on my keyboard. I should have reread myself. Of course
I meant ring and not diaeresis (above o). Sorry that's a typo.
> There are several meanings of "compatibility characters".
>
> The most important here are the characters that have a
> compatibility decomposition mapping. For details,
> see UTR 20: http://www.unicode.org/reports/tr20/.
Yes but these ones are NOT excluded from XML processing, which
should work also with characters having a compatibility decomposition
without affecting their supplementary meaning (wide, narrow, font, etc...)
> > And the "oe ligature" has only a compatiblity decomposition,
> > and then is not a compatibility character.
>
> The oe ligature characters have no decomposition at all.
I thought if had (it is used in French where it is clearly a typographic
ligature buf handled and sorted like two letters), as opposed to the ae
ligature (which is typographic ligature in French, but a true letter in
other
languages).
> > > Is somewhere a complete chart of "compatibility characters" ?
> >
> > Look at the Unicode data file which lists composition exclusions...
>
> Which is unrelated to the question posed! See UTR 20 instead.
I don't think that was the question... UTR20 is efectively more precise, but
some actions listed there are discutable (for example "use list item" or
"use <sub> markup" implies that the XML schema is HTML, but for general XML
processing HTML is not there... Such actions should have been restricted to
XHTML, and changed to "retain" in other cases.)
XML is not made only to represent text with markup, and XML conformance
requires not performing unsafe actions without knowledge of the context in
which the text is used. That's why the W3C recommands only the NFC form, and
not the NFKC form...
So as the UTR 20 is informative, and XML conformance is normative, I would
definitely not use UTR 20 which could break XML applications...
For me, the title of this UTR is wrong and should apply only to markup
languages based on XML (including XHTML), but not XML as a whole (and this
applies also to BiDi override controls, as there's no such "dir" attribute
name in the core XML schema !)
This archive was generated by hypermail 2.1.5 : Fri Nov 14 2003 - 12:12:28 EST