From: Sivakatirswami (katir@hindu.org)
Date: Wed Apr 27 2005 - 22:43:45 CST
Namaskar and Aloha from the offices of Himalayan Academy Publications
in Hawaii...
Where we are just slowly learning about Unicode in our publications
work..
I'm writing a short article on Unicode in a "public" magazine (Hinduism
Today) about Mac OSX Tiger ((10.4) support for Tamil Unicode...
I need to get down to a very layman's level and only have a very small
space allotment.
Despite reading all the documents ( I downloaded *all* the PDF's for
the 4.0 standard book) I *still* have trouble getting my head around
the difference between
1. The code points described as a simple series of integers from
1 to 1,123,000 (or whatever that last integer is that is equivalent to:
U+10FFFF)
This being the simplest way a layman can visualize it, albeit the
latter number is big... it still easy to describe and visualize
(roughly of course) as in:
"Unicode is this just a long series from One to over One Million and
there is a character in each place and the whole list includes all the
characters of all the languages known to man, past and present."
Which of course sounds at the very least "cool" for the glib-minded and
incredibly ground breaking for those who can see it for what it is...
(if true, which it seems to be...)
2. but then we move on to: " Unicode characters may be encoded at any
code point from U+0000 to U+10FFFF" and now we begin to slide into the
"nerd realm"
I understand "004F" to be the hexadecimal representation for four
separate, 4-bit sequences.
for purposes of a diagram, I would like to translate any given such
code point designation like A = U+0041 to its integer position in the
series. (aside question: what do you call that kind of "label" for the
code point: "U+****"?)
e.g. expressed verbally, if one were writing an article for "mom and
pop"
The capital letter A is number "65" in the series... but computer
geeks like to express it in hexidecimal form like this, "U+0041" and if
you really need to describe it to the computer then it is "0000 0000
0100 0001"
or in a diagram simply
A --> 65 --> U+0041 --> 0000 0000 0100 0001
And ditto for one Tamil Char and one Chinese character... but my
problem is ascertaining the second, simple integer, segement...
OK, so my questions are:
1) is the decimal expression for the capital letter A as 65 exactly
correspondent to its integer code point position in the total unicode
series expressed as as a series of integers?
2) How can one ascertain the integer number for a code point
outside-above base ANSI?
e.g. in the diagram I want to put an English char, a Tamil chara and a
Chinese character...
So I we want to be able to say, for the layman:
"The entire Tamil alphabet is contained between characters 2560 and
2843 in the unicode series" But one need sto
a) be able find where those blocks are (where do you go to find the
blocks beginning and endings for different languages)
b) be able to translate "U+0BE6" (which is a position in the Tamil set)
back to a simple integer in the series. If I just "do the math* using
the same correlation for the Letter A ["0041" = "65"therefore 0BE6 must
equal **** ] ... will it be correct?
I'm hoping I can go somewhere to find this info easily from some
tables....
TIA!
Sannyasin Sivakatirswami
Himalayan Academy Publications
at Kauai's Hindu Monastery
katir@hindu.org
www.HimalayanAcademy.com,
www.HinduismToday.com
www.Gurudeva.org
www.Hindu.org
This archive was generated by hypermail 2.1.5 : Thu Apr 28 2005 - 09:08:12 CST