From: Mark Davis (mark.davis@jtcsv.com)
Date: Wed Jun 04 2003 - 13:14:08 EDT
A few items:
I agree with your main point, which is that UCS-2 is, for all
practical purposes, just a repertoire subset of UTF-16; the code units
and bit-width are the same.
> Some Java classes that assume that the "char" arithmetic will
automatically roll after 16 bits are wrong. The JVM spec only requires
that char be at least 16-bit wide (but it may be larger). The compiled
classes need to store string constants. But these constants are
serialized to be platform independant using a UTF-8 encoding scheme.
I'm in the JSR 204 group looking at supplementary character support.
Although I won't speak to the details of the discussions in that
group, it is quite unlikely that char would be changed to be 32-bits.
It would break far too much.
> The probable official full support of Unicode 4 and 3.2 will come
with new classes derived from Character and String (UChar and UString
are their name in the IBM ICU package, but Sun may also keep the class
name but designate them under the java.text package insteads of the
core's java.lang package, and a compiler option (such as the target
Java version) may allow a class author to compile its code according
to the default java.lang.String or java.text.String class if the
package name is not specified by an explicit import).
In ICU4J (which is an add-on package for Java), we don't have classes
UChar and UString. For supplementary support, we have:
- UCharacter, which provides property functions based on code
points -- rather than chars (It also has all the UCD properties
instead of just the small fraction that are in the standard JDK.)
- UTR16, which provides utilities for using supplementaries with
String, StringBuffer and char[]
The other functionality, such as Normalizer, UnicodeSet, Collator,
StringSearch, Transliterator, etc. all handle supplementary
characters.
See http://oss.software.ibm.com/icu4j/doc/index.html for details.
BTW, I only very quickly scan long documents, such as those that you
and a few others are blessed with the ability to produce. So there may
be other items that I don't catch.
Marc
> -- Philippe.
> ----- Original Message -----
> From: "Michael (michka) Kaplan" <michka@trigeminal.com>
> To: "Philippe Verdy" <verdy_p@wanadoo.fr>
> Sent: Wednesday, June 04, 2003 4:36 PM
> Subject: Re: Encoding converion through JDBC
>
>
> > From: "Philippe Verdy" <verdy_p@wanadoo.fr>
> >
> > Phillipe, you went on for quite a while and I admit most of the
things you
> > talked about are not thing about which I have knowledge. But some
of the
> > things you talked about, I do understand, and in those cases you
were wrong.
> > Psychologically, it causes me to wonder how much of the rest of
this message
> > converys accurate information.
> >
> > Specifically, you talk about SQL Server but most of what you said
about it
> > is inaccurate. You cannot stored big endian data without risking
corruptipn,
> > you can only store UCS-2, it is not surrogate aware can can thus
be said to
> > truly support onlu UCS-2, not UTF-16, and the "N" prefix fields
*always*
> > mean UCS-2 for MSSQLS, period.
> >
> > You have a gift -- that of being able to speak knowledgably. But
please, use
> > that gift for *good* and do not move past what you know.
> >
> > Please, think about it?
> >
> > MichKa
> >
>
>
This archive was generated by hypermail 2.1.5 : Wed Jun 04 2003 - 14:15:22 EDT