RE: 3rd-party cross-platform UTF-8 support

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Thu Sep 20 2001 - 23:31:55 EDT


Ken

> > I have to convert from UTF-8 to UTF-16, before calling ICU
> functions (such
> > as ucol_strcoll() )
> >
> > I'm worried about the performance overhead of this conversion.
>
> You shouldn't be.
>
> The conversion from UTF-8 to UTF-16 and back is algorithmic and very
> fast.

To make this conversion fast in xIUA http://www.xnetinc.com/xiua/ I use an
externalized version of this converter so I don't have to go through and of
the common ICU conversation overhead.

However there is much more to UTF-8 support then just a converter. Many
string handling functions require separate deployments.

I agree totally, it is easier to write a collator in UTF-16 even easier to
write one in UTF-32. The cost of conversion to UTF-16 is probably made up in
the improved efficiency.

>
> If you are expecting better performance from a library that takes UTF-8
> API's and then does all its internal processing in UTF-8 *without*
> converting to UTF-16, then I think you are mistaken. UTF-8 is a bad
> form for much of the kind of internal processing that ICU has to do
> for all kinds of things -- particularly for collation weighting, for
> example. Any library worth its salt would *first* convert to UTF-16
> (or UTF-32) internally, anyway, before doing any significant semantic
> manipulation of the characters.
>
> > Are there any other cross-platform 3rd party unicode supports
> with better
> > UTF-8 handling ?

I would not have written xIUA if I know of a better alternative.

I also think that many people like the setlocale stile of programming with
and API that looks like standard C library calls such as
xiua_strcoll(str1,str2);

If all you need is UTF-8 there are things that you can do with xIUA. It is
easier to strip out functionality than add it.

Carl



This archive was generated by hypermail 2.1.2 : Thu Sep 20 2001 - 22:24:45 EDT