Mike Brown wrote:
> 1. In Unicode, each abstract character has a descriptive name, in English,
> like "LATIN CAPITAL LETTER A", and may have additional names that are
> translations of the English name into other languages.
Unicode 3.0 (got my copy! hurray!) has changed the effective definition of
"abstract character". Two characters which are 1-1 canonical equivalents,
like ANGSTROM SIGN and LATIN CAPITAL LETTER A WITH RING ABOVE, are now
considered the *same* abstract character, encoded two different ways.
Also, we are now allowed to talk about things like LATIN CAPITAL LETTER Q
WITH CIRCUMFLEX as an abstract character that Unicode doesn't encode
(but that can be represented as Q + combining circumflex).
> The ISO standard also defines a 16-bit encoding form called UCS-2, in which
> a 16-bit code value in the code space 0x0..0xFFFF directly corresponds to an
> identical scalar value, but this form is, of course, inherently limited to
> representing only the first 65,536 scalar values.
UCS-2 is bogus and shouldn't be explained before UTF-16, which has been the
real deal since Unicode 2.0.
--Schlingt dreifach einen Kreis vom dies! || John Cowan <jcowan@reutershealth.com> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT