xICU 3.0 Status - (Simplified Unicode Implementation)

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Mon Jun 11 2001 - 18:16:17 EDT


xICU (X.Net interface for ICU) Version 3.0 is now undergoing testing. This
interface is a run time interface for ICU (International Components for
Unicode) http://oss.software.ibm.com/icu/

This software is free Open Source code. It serves a different purpose from
ICU. ICU is flexible, comprehensive and versatile. Even though you have
the source available it is usually better not to modify the code. xICU,
however, is designed to be tailored to your application specifics. It can
be used as is but developers can easily change it our use the code from xICU
in there current internationalization interface module.

xICU provides a cross-platform thread and locale management system. You can
set thread independent locales to be used for all subsequent calls.

It also provides a set of common APIs that will support UTF-16, UTF-8,
UTF-32 and code page data. Its APIs are designed to be familiar to most C
programmers. It also enhances some functions. While functions like strtok
are not thread safe, xicu_strtok is.

Other functions have been changes to provide a similar but more usable
function. Some functions need to be adapted to work well for Unicode data.
For example xicu_strncpyEx(char *str1, char * str2, int32_t length); is
similar to strncpy. The strncpy will copy characters from str2 to str1
until either a null end of string character is found or the number of
characters copied matched the length limit.

This function is usually used to copy strings into a buffer with a known
length. Specifying a copy length limit prevents the program from coping too
many characters into the buffer and overflowing into other data areas with
unpredictable results. However, the program usually expects the data copied
to be a valid string with a null terminating character at the end of the
string even if it is truncated. To insure this it is a common practice to
put a null character in the last position of the buffer just in case there
is an overflow.

This does not work for Unicode and multi-byte data. For example a UTF-32
null character is a 32-bit (4 byte) null character. Likewise if the buffer
size is not a multiple of 4 bytes is must be rounded down to the nearest
multiple. Copying by bytes an putting a byte null at the end of a UTF-8 or
multi-byte string may end up the a partial character at the end of the
string.

xicu_strncpyEx operates differently in that it only copies an complete
characters and always terminates the string with a valid null character.
strncpy(str1, str2, length) returns a pointer to str1. This is information
that is already available and is not a meaningful return. xicu_strncpyEx on
the other hand returns that actual length copied. This makes it easier to
use.

xICU provides data API compatibility by providing sets of calls where
appropriate. For example

xicu_strncpyEx(char *str1, char * str2, int32_t length); /* main API */
xiu2_strncpyEx(UChar *str1, UChar * str2, int32_t length); /* UTF-16/UCS-2
API */
xiu4_strncpyEx(UChar32 *str1, UChar32 * str2, int32_t length); /*
UTF-32/UCS-4 API */
xiu8_strncpyEx(char *str1, char * str2, int32_t length); /* UTF-8 API */
xicp_strncpyEx(char *str1, char * str2, int32_t length); /* Code page API
*/

The locale setting tracks the user data format, code page, time zone and
other settings. xICU will invoke the appropriate services to perform the
function. xICU allows you to use a general API of a data type specific API.
It will handle to transformation of data lengths, it will convert parameters
and results or provide parallel function implementations as appropriate.

It not only supports standard Unicode compliant functions like
xicu_strtoupper with full special case support but also
xicu_strtoupperInplace for those special cases where it is not practical to
change to program logic to support separate source and target buffers. It
also provides alternate implementations to support different types of
applications and different ranges of control. The collation interface is a
good example. The xICU set of default collation parameters reflect typical
uses but are easy to tailor to specific requirements.

This new release provides internal thread support as well as internal
working memory support, but most important it requires not changes to the
base ICU code.

X.Net, Inc. is working on this project pro bono to promote the use of ICU
which we believe is the best C/C++ Unicode and globalization product
available today. Even though ICU is also a free Open Source package, it is
far more comprehensive than other commercial software. xICU is designed to
help companies save man-months of work implementing ICU by providing a
working model interface routine.

Carl W. Brown
X.Net, Inc.



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT