Re: Writing a Unicode library from scratch vs. Off-the-shelf

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Wed Jul 05 2000 - 20:04:36 EDT


Mike Newhall wrote:
> 1) What free libraries are around? Does anyone have experience with their
> quality / degree of functionality, etc.?

well, from ibm there is the open-source icu library:
http://oss.software.ibm.com/icu/

we are trying hard to please customers like the db/2 database people and lotus notes, among many others...

for other libraries, check http://www.unicode.org/unicode/standard/UnicodeEnabledProducts.html

> 3) Any advice from those who have written Unicode / UTF-8 / UTF-16 / UTF-32
> libraries would be appreciated. Specifically, what is the scope of the
> functionality required for general Unicode text handling? Clearly one must
> deal with variable-length characters, except in the case of UTF-32. What
> does this mean for the library interface - how does it change the
> appearance of an API from one that deals only with fixed-length characters?

yes, this is coming up. i will have a presentation about this at the next unicode conference (iuc 17, session b2), talking about how we decided how to deal with variable-width utf-16 in the formerly ucs-2-only icu.

> Does the library / API have to be aware of other, higher-level (lingustic)
> multi-character sequences? For example, a "length of string in graphemes"
> function, in addition to a "length of string in characters" function, and a
> "length of string in character storage units" function?

that, too, yes. you need strings and break iterators for that.

markus



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT