Re: Code pages and Unicode

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Tue, 23 Aug 2011 20:00:42 +0100

On Mon, 22 Aug 2011 16:18:56 -0700
Ken Whistler <kenw_at_sybase.com> wrote:

> How about Clause 12.5 of ISO/IEC 10646:
>
> <001B, 0025, 0040>
>
> You "escape" out of UTF-16 to ISO 2022, and then you can do whatever
> the heck you want, including exchange and processing of complete
> 4-byte forms, with all the billions of characters folks seem to think
> they need.

> Of course you would have to convince implementers to honor the ISO
> 2022 escape sequence...

Which they only need to if the text is in an ISO 2022 or similar
context. Your idea does suggest that a pattern of
<high><high><SO><low> would be reasonable. The shift-out code U+000E
has no meaning as a Unicode character so it wouldn't be unreasonable to
require a special check that one finds a full character if looking for
a one-character string consisting only of U+000E. We could also have
<high><high><SI><low> to gives the full *two* thousand million odd
characters that would be resupported by UTF-32.

Richard.
Received on Tue Aug 23 2011 - 14:03:20 CDT

This archive was generated by hypermail 2.2.0 : Tue Aug 23 2011 - 14:03:22 CDT