Re: creating a test font w/ CJKV Extension B characters.

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Fri Nov 21 2003 - 07:38:27 EST

  • Next message: Gary P. Grosso: "Re: font tools (was creating a test font w/ CJKV Extension B characters.)"

    On Thu, 20 Nov 2003 21:02:49 -0800, "Doug Ewell" wrote:
    >
    > An invalid GB18030 sequence, like <FE 40>, or a valid but out-of-range
    > sequence, like <E3 32 9A 36>, should be treated just like an invalid or
    > out-of-range UTF-8 sequence. Issue an error message, format the hard
    > disk, whatever; just don't try to treat it like a normal character.
    >

    Hmm, surely <FE 40> is a valid GB-18030 sequence = U+FA0C according to my
    reckoning (although Word fails to correctly convert <FE 40> when told to open a
    file as GB-18030, it does save U+FA0C as <FE 40> when told to save as GB-18030).

    In BabelPad I convert any invalid GB-18030 characters to U+FFFD ("used to
    replace an incoming character whose value is unknown or unrepresentable in
    Unicode"), and notify the user that the file has been opened with errors, which
    I think is a compliant and sensible implementation. (Unfortunately I've just
    noticed that BabelPad has a slight bug with out of range GB-18030 values such as
    <E3 32 9A 36> = U+110000.)

    Andrew



    This archive was generated by hypermail 2.1.5 : Fri Nov 21 2003 - 08:21:16 EST