ICU conversion of codepage data (Was: japanese xml

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Sat Sep 01 2001 - 19:54:49 EDT


Misha,

> case of Japanese) may cover all the characters you require, in which
> Additionally, if you are thinking of XML (or
> HTML) then you can encode *all* Unicode characters in an EUC-encoded
> document, by employing numeric character references for characters
> outside the EUC character repertoire. Using the same technique, you can
> encode all Unicode characters in an ASCII-encoded document.
>

Your comment that you can encode all Unicode characters in code page text
(&#nnnnn; or &#xhhhh;) reminded me that I should make a change to xIUA to
take advantage of the power of ICU. ICU will let you set your converter so
that it will produce nicely HTML/XML compatible escape sequences for all the
characters that it can not convert to the specified code page.

If you are sending code page data to a browser it make more sense to use
escape sequences just in case you have a Unicode capable browser and the
fonts to display the characters. This can now be optionally selected.

Carl



This archive was generated by hypermail 2.1.2 : Sat Sep 01 2001 - 20:49:50 EDT