From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Fri Nov 21 2003 - 07:38:27 EST
On Thu, 20 Nov 2003 21:02:49 -0800, "Doug Ewell" wrote:
>
> An invalid GB18030 sequence, like <FE 40>, or a valid but out-of-range
> sequence, like <E3 32 9A 36>, should be treated just like an invalid or
> out-of-range UTF-8 sequence. Issue an error message, format the hard
> disk, whatever; just don't try to treat it like a normal character.
>
Hmm, surely <FE 40> is a valid GB-18030 sequence = U+FA0C according to my
reckoning (although Word fails to correctly convert <FE 40> when told to open a
file as GB-18030, it does save U+FA0C as <FE 40> when told to save as GB-18030).
In BabelPad I convert any invalid GB-18030 characters to U+FFFD ("used to
replace an incoming character whose value is unknown or unrepresentable in
Unicode"), and notify the user that the file has been opened with errors, which
I think is a compliant and sensible implementation. (Unfortunately I've just
noticed that BabelPad has a slight bug with out of range GB-18030 values such as
<E3 32 9A 36> = U+110000.)
Andrew
This archive was generated by hypermail 2.1.5 : Fri Nov 21 2003 - 08:21:16 EST