Re: FW: How to convert UChar(definded by icu same as the utf-8) to ascii?

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Wed Oct 17 2001 - 17:49:20 EDT


ICU defines UChar to be an unsigned 16-bit unit (an unsigned short, or uint16_t). Strings are in UTF-16, not UTF-8.

When you know that you are dealing with Unicode code units and a conversion to US-ASCII, then all you need to do is to truncate each of these 16-bit units to 8 bits (to char). This will not work for any values >=0x80, of course, so you should check for that.

If you have some ASCII-based codepage like GBK, but not US-ASCII itself, then you need to use a converter.
See http://oss.software.ibm.com/icu/apiref/ucnv_h.html
and http://oss.software.ibm.com/icu/userguide/

If you have actual UTF-8 strings (not UChar *), then you do not need to do anything to convert to US-ASCII because the latter is a subset of the former, even encoding-wise. (You should still check that no value is >=0x80.)

For further technical questions about ICU please consult the homepage and subscribe to the icu mailing list: http://oss.software.ibm.com/icu/

Best regards,
markus

> "Magda Danish (Unicode)" wrote:
> -----Original Message-----
> From: xuxiao.263 [mailto:wind_child@263.net]
> Sent: Tuesday, October 16, 2001 9:51 PM
> To: info@unicode.org
> Subject: How to convert UChar(definded by icu same as the utf-8) to ascii?
>
> hi,I want your help.
> How to convert UChar(definded by icu same as the utf-8) to ascii?
> Could you give me an example written by C?



This archive was generated by hypermail 2.1.2 : Wed Oct 17 2001 - 18:30:14 EDT