ICU conversion of codepage data (Was: japanese xml

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Sat Sep 01 2001 - 19:54:49 EDT

Previous message: Mark Davis: "Anyone see this?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Misha,

> case of Japanese) may cover all the characters you require, in which
> Additionally, if you are thinking of XML (or
> HTML) then you can encode *all* Unicode characters in an EUC-encoded
> document, by employing numeric character references for characters
> outside the EUC character repertoire. Using the same technique, you can
> encode all Unicode characters in an ASCII-encoded document.
>

Your comment that you can encode all Unicode characters in code page text
(&#nnnnn; or &#xhhhh;) reminded me that I should make a change to xIUA to
take advantage of the power of ICU. ICU will let you set your converter so
that it will produce nicely HTML/XML compatible escape sequences for all the
characters that it can not convert to the specified code page.

If you are sending code page data to a browser it make more sense to use
escape sequences just in case you have a Unicode capable browser and the
fonts to display the characters. This can now be optionally selected.

Carl

Previous message: Mark Davis: "Anyone see this?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Sat Sep 01 2001 - 20:49:50 EDT