Writing a Unicode library from scratch vs. Off-the-shelf

From: Mike Newhall (mike.newhall@av.com)
Date: Wed Jul 05 2000 - 17:56:54 EDT


        For internationalizing string handling on a web-based, C++ project, we are
in the early stages of deciding whether to use an off-the-shelf Unicode
string-handling library or to write one from scratch. Why re-invent the
wheel? We may need to really optimize the implementation for a very heavy
workload, and miscellaneous other reasons. I emphasize that this decision
has not been made yet. This brings up a few questions:

1) What free libraries are around? Does anyone have experience with their
quality / degree of functionality, etc.?

2) Ditto for commercial libraries?

        For both commercial and free libraries, of particular interest are
opinions on the quality, elegance, and completeness of the APIs, pros and
cons, comparisons, etc., for selection or in case we write our own, in
which case it would be good to learn from the APIs that have gone before.

3) Any advice from those who have written Unicode / UTF-8 / UTF-16 / UTF-32
libraries would be appreciated. Specifically, what is the scope of the
functionality required for general Unicode text handling? Clearly one must
deal with variable-length characters, except in the case of UTF-32. What
does this mean for the library interface - how does it change the
appearance of an API from one that deals only with fixed-length characters?
 Does the library / API have to be aware of other, higher-level (lingustic)
multi-character sequences? For example, a "length of string in graphemes"
function, in addition to a "length of string in characters" function, and a
"length of string in character storage units" function?

Mike Newhall
Software Engineer
AltaVista



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT