From: Doug Ewell (dewell@adelphia.net)
Date: Sat Mar 17 2007 - 15:15:22 CST
Dan Kogai <dankogai at dan dot co dot jp> wrote:
> I am really surprised to find that EUC and UTF-8 can be mashed up
> easily.
>
> The secret is \xFF. This byte NEVER appears in EUC or UTF-8. So you
> can define the combo character as follow;
>
> EUC_UTF8_CHAR = EUC_CHAR | \xFF + UTF8_CHAR
No no no no. Please don't do this. Nobody else will implement it and
you will be effectively limited to using it internally within your own
programs.
Just use UTF-8, or if saving bytes is that important to you, use SCSU or
a general-purpose compression technique. See UTN #14 for more on
Unicode text compression.
As someone who has created a number of alternative encoding schemes, I
assure you that a scheme that "looks like" EUC or "looks like" UTF-8
will cause you much more trouble than a completely new scheme that can't
be confused for anything else.
-- Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14 http://users.adelphia.net/~dewell/ http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages
This archive was generated by hypermail 2.1.5 : Sat Mar 17 2007 - 15:17:48 CST