RE: Support for Japanese characters

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Mon Mar 11 2002 - 04:40:24 EST

Previous message: Rajat Bawa: "OS/390 & Unicode"
Maybe in reply to: Eric Ray: "Support for Japanese characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Eric Ray wrote:
> 1. The library does not really evaluate the Japanese characters
> to make logical decisions. We believe base64 encode the
> character array to avoid any "bad things happening in the code"
> (such as hitting a null value or other values that could
> potential cause problems).

Hint: consider revising your project on the light of the fact that both
Unicode (ISO 10646) and the Japanese character set (JIS X 0208) have
ASCII-compatible "multibyte" formats.

Unicode's ASCII-compatible format is called UTF-8. The most popular JIS
ASCII-compatible format is called EUC.

ASCII-compatible means that all byte in the ASCII range (0-128) are only
used for ASCII characters. So, among other things, no "bad things" happen
with null terminators or control characters.

For UTF-8, see Unicode's FAQ
<http://www.unicode.org/unicode/faq/utf_bom.html> or read the historical RFC
which proposed it <http://www.faqs.org/rfcs/rfc2279.html>.

BTW, base64 was also the base of an obsolete Unicode format called UTF-7.
Searching UTF-7 on the web, you'll find a few information and lots of bitter
comments about why this approach is obsolete.

_ Marco

Previous message: Rajat Bawa: "OS/390 & Unicode"
Maybe in reply to: Eric Ray: "Support for Japanese characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Mon Mar 11 2002 - 04:55:27 EST