From: Mark Davis (mark.davis@jtcsv.com)
Date: Fri Nov 14 2003 - 10:57:48 EST
Phillipe, instead of trying to sound authoritative by making up a whole-cloth
definition -- one that is completely and utterly wrong -- and thereby confuse
and mislead a beginner, you should either be silent or simply point the person
to the Unicode glossary:
http://www.unicode.org/glossary/#compatibility_character
Mark
__________________________________
http://www.macchiato.com
► शिष्यादिच्छेत्पराजयम् ◄
----- Original Message -----
From: "Philippe Verdy" <verdy_p@wanadoo.fr>
To: "Alexandre Arcouteil" <lex@free.fr>
Cc: <unicode@unicode.org>
Sent: Fri, 2003 Nov 14 03:28
Subject: Re: compatibility characters (in XML context)
> ----- Original Message -----
> From: "Alexandre Arcouteil" <lex@free.fr>
> To: <unicode@unicode.org>
> Sent: Friday, November 14, 2003 10:41 AM
> Subject: compatibility characters (in XML context)
>
>
> > This is a beginner question :
> >
> > In the XML 1.1 Proposed Recommendation 05 November 2003
> > (http://www.w3.org/TR/xml11), it is said that "Document authors are
> > encouraged to avoid "compatibility characters", as defined in section
> > 6.8 of [Unicode]" so relating to Unicode 2.0.
> >
> > I don't see any online documentation about explicit definition of
> > "compatibility characters" according to 2.0.
>
> Compatibility characters can be defined as the characters whose canonical
> decomposition mapping is either::
>
> (1) a singleton (example the Angström symbol, canonically mapped to A
> with diaeresis, or the list of unified Han ideographs, only included for
> compatibility with legacy charsets or because of assignment errors in
> Unicode 1.0) and that are implicitly restricted from being recomposed in all
> NF* forms, or
>
> (2) two-code _canonical_ decomposition mapping, but are excluded from
> canonical composition (example the hebrew shin letter with shin dot).
>
> These characters will never be part of any string in a normalized form (NFC,
> NFD, NFKC, NFKD).
>
> > At least I'd like to know if characters like "é" "ç" or "œ" are
> > concerned.
>
> No.: "é" and "ç" have canonical decompositions, but are not excluded from
> recomposition.
> And the "oe ligature" has only a compatiblity decomposition, and then is not
> a compatibility character.
>
> > Is somewhere a complete chart of "compatibility characters" ?
>
>
> Look at the Unicode data file which lists composition exclusions...
>
>
>
This archive was generated by hypermail 2.1.5 : Fri Nov 14 2003 - 11:58:40 EST