UCS-2, UTF-16, and Java (was RE: U+xxxx, U-xxxxxx, and the basics )

From: Mike Brown (mbrown@corp.webb.net)
Date: Mon Mar 06 2000 - 15:22:05 EST


> when mentioning and describing UCS-2, deprecate its use
> clearly so that newcomers understand that they *need to*
> support UTF-16.

I reworded the text in my XML tutorial a little bit and am HTMLizing it now
by hand since MS Word 2000 goes nuts with extraneous markup (which is
actually rather cool in its own way, lots of CSS and even some XML in
there). I'll add a mention of UCS-2 being deprecated. I'm more inclined to
go this route than to not mention UCS-2 at all, for 2 reasons.

First, it is in my opinion easier to understand UTF-16 if you first think of
what happens when a 16-bit code space is used in the most obvious way, with
a 1-to-1 relationship between code points and code values: 65,536 code
values representing 65,536 code points representing 65,536 abstract
characters. i.e., UCS-2.

Second, XML is based on ISO/IEC 10646-1;1993, not The Unicode Standard
version 2.x or 3.0. So the ISO encoding forms are, unfortunately, still
relevant, deprecated as they may be in Unicode.

I have a question, though. I have seen a reference somewhere saying that
Java characters and strings are UCS-2 encoded, and I saw a reference
somewhere else saying they are UTF-16 encoded. Which is it?



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT