Re: Java and Unicode

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Wed Nov 15 2000 - 20:02:45 EST


Please let's keep types for single characters and types for strings separate.

ICU used to be in the same situation as Java: everything character/string used 16-bit types.
In extension to UTF-16, we decided to keep the string base type at 16 bits for very good reasons like interoperability and memory consumption.
For single characters, ICU changed APIs from 16-bit to 32-bit types.

In the case of Java, the equivalent course of action would be to stick with a 16-bit char as the base type for strings. The int type could be used in _additional_ APIs for single Unicode code points, deprecating the old APIs with char.

Whatever Sun decides to do with single characters, it will be most reasonable to keep the string encoding the same and just treat it as UTF-16 where that makes a difference.

For details, see my presentation at the IUC 17 Unicode conference (2000 September, session B2).
(See http://www.unicode.org/ - I am having some trouble with web access right now, so I cannot give you the URL...)

markus



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT