Changjian Sun said:
> For cross-platform software (NT,Solaris,HP,AIX), the only 3rd-party
> unicode support
> I found so far is IBM ICU.
> It's a very good support for cross-platform software internationalization.
> However,
> ICU internally uses UTF-16, For our application using UTF-8 as input and
> output,
> I have to convert from UTF-8 to UTF-16, before calling ICU functions (such
> as ucol_strcoll() )
>
> I'm worried about the performance overhead of this conversion.
You shouldn't be.
The conversion from UTF-8 to UTF-16 and back is algorithmic and very
fast.
If you are expecting better performance from a library that takes UTF-8
API's and then does all its internal processing in UTF-8 *without*
converting to UTF-16, then I think you are mistaken. UTF-8 is a bad
form for much of the kind of internal processing that ICU has to do
for all kinds of things -- particularly for collation weighting, for
example. Any library worth its salt would *first* convert to UTF-16
(or UTF-32) internally, anyway, before doing any significant semantic
manipulation of the characters.
> Are there any other cross-platform 3rd party unicode supports with better
> UTF-8 handling ?
In my opinion, it is unlikely that there are *any* good Unicode libraries
that provide pure UTF-8 handling only, inside and out. It is just
more efficient, elegant, and higher-performance to take the form
conversion hit, but then use a better processing form for manipulating
the characters.
UTF-8 shines as a legacy API and protocol compatibility form.
But it stinks as a processing form.
--Ken
> Thanks a lot.
>
> -Changjian Sun
This archive was generated by hypermail 2.1.2 : Thu Sep 20 2001 - 14:45:06 EDT