Re: U+xxxx, U-xxxxxx, and the basics

From: John Cowan (jcowan@reutershealth.com)
Date: Mon Mar 06 2000 - 09:58:53 EST


Mike Brown wrote:

> 1. In Unicode, each abstract character has a descriptive name, in English,
> like "LATIN CAPITAL LETTER A", and may have additional names that are
> translations of the English name into other languages.

Unicode 3.0 (got my copy! hurray!) has changed the effective definition of
"abstract character". Two characters which are 1-1 canonical equivalents,
like ANGSTROM SIGN and LATIN CAPITAL LETTER A WITH RING ABOVE, are now
considered the *same* abstract character, encoded two different ways.

Also, we are now allowed to talk about things like LATIN CAPITAL LETTER Q
WITH CIRCUMFLEX as an abstract character that Unicode doesn't encode
(but that can be represented as Q + combining circumflex).
 
> The ISO standard also defines a 16-bit encoding form called UCS-2, in which
> a 16-bit code value in the code space 0x0..0xFFFF directly corresponds to an
> identical scalar value, but this form is, of course, inherently limited to
> representing only the first 65,536 scalar values.

UCS-2 is bogus and shouldn't be explained before UTF-16, which has been the
real deal since Unicode 2.0.
 

-- 

Schlingt dreifach einen Kreis vom dies! || John Cowan <jcowan@reutershealth.com> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT