RE: 6 questions

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Tue Sep 18 2001 - 17:04:14 EDT

Previous message: Hietaniemi Jarkko (NRC/Boston): "discontent about Indic scripts and Unicode"
In reply to: Magda Danish (Unicode): "FW: 6 questions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Bernard,

Many of your questions have been answered by others but I wants to add a few
comments.

>
>
> 1. Why does Unicode say that there are 63486 code
> values available to represent characters with single
> 16 bit values and 2048 available to represent an
> additional 1,048,544 characters as surrogates? 65536 -
> 2048 = 63488 (difference of 2) --I guess it's due to
> the 2 code values guaranteed not to be characters. But
> what about: 1024 x 1024 = 1,048,576 (difference of
> 32), what accounts for the 32?

U+FDD0 to U+FDEF are also noncharacters that represent a range that can be
used by font rendering engines as an internal working set.

> 4. Greek final sigma is not considered a compatibility
> decomposition (word position variant) because it's
> usage could also be dependant on spelling convention?
> Is that right? Even if so, isn't it more consistent to
> precede sigma with a non joiner if you don't want it
> to automatically be displayed as final sigma at the
> end of a word?

The new final sigma algorithm will determine whether to use a final for or
not within the range of variance of normal Greek usage with a few very
obscure usages that would have made the algorithm overly complex. You can
get text for anywhere including code page or keyboard entry that do not
support a non-joiner or other markup. Beside you want to keep the amount of
markup need for text presentation to a minimum.

> 6. Why does Unicode use "capital" vs "small letter"
> terminology instead of "uppercase" vs "lowercase"? It
> seems like lowercase is more descriptive than "small
> letter".
>

The problem is that you not only have uppercase but you have titlecase as
well. Titlecase uses titlecase letters, capital letters, small letters and
case less letters. This avoids the confusion. The are only four letters
that are exclusively used for title case. The rest of the letters are used
as part of a case conversion but they do not represent a case themselves.

Carl

Previous message: Hietaniemi Jarkko (NRC/Boston): "discontent about Indic scripts and Unicode"
In reply to: Magda Danish (Unicode): "FW: 6 questions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Sep 18 2001 - 16:11:40 EDT