Marco
>
> > Is the conversion from euc-jp to utf-8/utf-16 simple; are there
> > algorithms and/or converters, out there, that I can access?
>
> Such a conversion requires three steps:
>
> 1) decode EUC byte sequences into JIS code points (i.e. get one
> integer for
> each character);
> 2) convert JIS code points to Unicode code points;
> 3) encode Unicode code points into UTF-8 byte sequences (or UTF-16 word
> sequences).
>
> Steps 1 and 3 are very simple and totally algorithmic. Step 2 is more
> complex, and requires looking up some sort of "dictionary" or conversion
> table.
>
> There are many free implementations of such converters available
> on the web.
> One of the first places to look at for such things is the open source ICU
> library (go on IBM site and search "ICU" or "Unicode").
ICU http://oss.software.ibm.com/icu/ will convert EUC-JP directly into
Unicode UTF-16. It also will convert to UTF-8. You can get web support for
ICU with xIUA http://www.xnetinc.com/xiua/ that will let you develop thread
safe applications that will allow you to handle different forms of Unicode
and code pages with the same code. It also provides you with UTF-8 and code
page string handling. It will support a request for example that has data
in EUC-JP a UTF-8 database and Shift_JIS WAP device. It has special web
routines for example accept charset string processing. For XML support you
can use the XERCES http://xml.apache.org/xerces-c/index.html parser. This
also uses ICU.
I can not imagine that all phones would have support for all character sets.
This would be a tremendous amount of code. Therefore you might be better
off using UTF-8. With UTF-16 you have potential big endian/little endian
problems.
Carl
This archive was generated by hypermail 2.1.2 : Thu Aug 30 2001 - 14:11:00 EDT