From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Fri Mar 14 2003 - 12:29:31 EST
Let's try this:
ICU has C header files with macros for code point handling in UTF-8/16 strings. See the utf8.h and
utf16.h headers (together with utf.h) in ICU's source tree at source/common/unicode/.
http://oss.software.ibm.com/icu/download/
http://oss.software.ibm.com/cvs/icu/icu/source/common/unicode/
There is also a utf32.h header, but that is empty now. I redesigned the set of macros last year to
simplify and improve them a bit.
Specifically, see below.
(Note that the UTF-8 macros [except for the "unsafe" ones] handle the complicated cases in functions
that are called from inside the macros. See source/common/utf_impl.c . Safe UTF-8 handling requires
a lot of error checks.)
askq1 askq1 wrote:
> I want c/c++ code that will give me UTF8 byte sequence representing a
> given code-point, UTF16 16 bits sequence reppresenting a given
> code-point, UTF32 32 bits sequence representing a given code-point.
>
> e.g.
>
> UTF8_Sequence CodePointToUTF8(Unichar codePoint)
Use U8_APPEND().
http://oss.software.ibm.com/icu/apiref/utf8_8h.html#a12
To read a code point from UTF-8, use U8_NEXT()
http://oss.software.ibm.com/icu/apiref/utf8_8h.html#a10
or U8_GET() etc.
> UTF16_Sequence CodePointToUTF16(Unichar codePoint)
U16_APPEND()
http://oss.software.ibm.com/icu/apiref/utf16_8h.html#a16
To read a code point from UTF-8, use U16_NEXT()
http://oss.software.ibm.com/icu/apiref/utf16_8h.html#a16
or U16_GET() etc.
> UCS2_Sequence CodePointToUCS2(Unichar codePoint)
For UCS-2, the best strategy (in my opinion) is to treat it exactly the same as UTF-16. Most people
mean UTF-16 when they talk about UCS-2 or generally about "16-bit Unicode".
If you do want to distinguish them anyway, then this is trivial:
if(0<=codePoint<=0xffff) {
cast codePoint to 16-bit type and emit;
} else {
error;
}
Similarly, UTF-32 is trivial as well - it just stores each code point value in a 32-bit integer
unit. Unicode code points are values 0..0x10ffff.
See also http://oss.software.ibm.com/cvs/icu/~checkout~/icu/source/samples/ustring/ustring.cpp
I hope this helps - best regards,
markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Fri Mar 14 2003 - 13:06:21 EST