From: Markus Scherer (markus.icu@gmail.com)
Date: Wed Jan 04 2006 - 18:06:40 CST
On 1/4/06, Philippe Verdy <verdy_p@wanadoo.fr> wrote:
> In ICU4J there are new datatypes, for handling 32-bit code units/codepoints and for working with Unicode strings handled internally as vectors of codepoints, and conversion functions between java native String class and the alternate UString class. But using it is rarely justified (conversion of Strings has a significant performance cost on the VM).
I don't know where you get this idea. ICU4J uses the regular Java
String class. It provides utility functions to work with code points
(as int values) much like what Java 5 added, and it correctly handles
surrogate pairs in Strings where appropriate, but there is no
separate/parallel ICU4J-specific String or UString class. It really
works like Java 5, except that you can use it with Java 1.4 as well.
> I recommand you to look at the Java 5 API documentation, instead of assuming there was a bug(there was none, and you could very well work using Java 1.4 and lower with any valid Unicode string containing non-BMP characters, even without the ICU4J library, provided that your code properly handled surrogate "char"s).
Except that most JDK implementation code, for example for regular
expressions, BreakIterator, etc., simply treated surrogate code units
as separate characters. Please see the document to which Naoto
pointed.
markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Wed Jan 04 2006 - 18:10:07 CST