Re: Java and Unicode

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Nov 16 2000 - 16:03:24 EST


Juliusz Chroboczek wrote:
> I believe that Java strings use UTF-8 internally.

.class files use a _modified_ utf-8. at runtime, strings are always in 16-bit unicode.

> At any rate the
> internal implementation is not exposed to applications -- note that
> `length' is a method in class String (while it is a field in vector
> classes).

but length() and charAt() are some of the apis that expose that the internal representation is in 16-bit unicode, at least semantically. length() counts 16-bit units from ucs-2/utf-16, not bytes from utf-8 or code points from utf-32. all charAt() and substring() etc. behave like that.

markus



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT