From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Mon Feb 17 2003 - 17:44:00 EST
I would like to add some information here without getting myself into the core of the discussion:
HTML recognizes a lot fewer "whitespace" characters than Java or Unicode. Different people have
different sets of "whitespace" characters.
Unicode's White_Space property (PropList.txt) contains 24 code points (Unicode 3.2) but not U+FEFF.
U+FEFF ZWNBSP is a format control (Cf), not any kind of space in the usual sense.
U+FEFF, like all Cf, is a Default_Ignorable_Code_Point (DerivedCoreProperties.txt). (That is,
sorting, searching, matching, etc. usually ignore it unless such code points are explicitly useful.)
RFC 2279 *is* being updated, see http://www.ietf.org/internet-drafts/draft-yergeau-rfc2279bis-03.txt
Version -04 is supposed to be public shortly.
markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Mon Feb 17 2003 - 18:30:42 EST