From: Francois Yergeau (FYergeau@alis.com)
Date: Wed May 21 2003 - 14:37:30 EDT
Philippe Verdy écrit :
> According to Unicode, CR+ACCUTE is in NFC form, and so
> complies with XML requirement(?) for handling in DOM (where
> all should be performed using NFC). But according to XML (or
> HTML) the parsed document must then be converted
> (interpreted) as if it was SPACE+COMBINING ACCUTE ACCENT
> which is not NFC.
It is NFC.
> If canonicalizing the document, it will become a single NON
> COMBINING ACCUTE ACCENT...
From the UCD:
00B4;ACUTE ACCENT;Sk;0;ON;<compat> 0020 0301;;;;N;SPACING ACUTE;;;;
This is only a <compat> decomposition, so SPACE+COMBINING ACCUTE ACCENT
remains unchanged in NFC.
A more interesting case is that of U+0338 COMBINING LONG SOLIDUS OVERLAY,
which combines with > to give U+226F NOT GREATER-THAN. This can damage XML
files.
-- François Yergeau
This archive was generated by hypermail 2.1.5 : Wed May 21 2003 - 15:45:08 EDT