From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Jan 05 2005 - 13:17:02 CST
> On Wednesday, January 5th, 2005 14:18Z Philipp Reichmuth va escriure:
>
> > I wouldn't rule this out entirely. For example, I know one attempt
> > to implement a Tibetan font where the underlying representation was
> > Latin (Wylie), and the Tibetan glyphs were generated from the Latin
> > transliteration using OpenType rules
I presume Philipp Reichmuth was talking about:
http://www.nitartha.org/wylieandconverter.html
>
> Or I did not understand you, or this has nothing to do with Unicode, much
> less Unicode/10646 conformance.
>
> The Tibetan characters are _never_ encoded using Unicode in this process,
> are they?
> Looks like a clear case of nonconformance to me.
Not at all. It is simply another example of someone using
a transliteration as a kind of betacode for text, and then
adapting a font technology to display it.
If an application clearly states what it is doing, it can
do this conformantly in Unicode.
If I state that data is entered and stored as "bsgribs" or
"g.yon" or "sat+t+wa" or whatever and that I then provide
an OpenType font with rules that will map those into the
appropriate display sequences of Tibetan stacks, anybody else who
wants to share those conventions (or use the application) is
free to do so.
The Unicode *conformance* issue there is whether the Latin
letter "b" used in the Wylie transliteration is correctly
represented as U+0062, and whether, if using UTF-16, that
shows up in stored data and strings as a 16-bit code unit,
0x0062, or if using UTF-8, that shows up in stored data
and strings as an 8-bit code unit, 0x62, and so on.
If somebody wants to build conventions on top of that basic
identity of the character, and use the "b" to represent
a Tibetan letter or a Semitic letter or a type of atomic
bond or a chocolate chip cookie recipe, Unicode neither
knows nor cares -- it is up to the users of such conventions
to define what they are doing for themselves.
Sheesh, folks, I don't know why this is so difficult to
comprehend.
The Unicode Consortium does not, and never has tried to
constrain what conventions and higher-level protocols people
can apply on top of Unicode characters. The standard does
not and never has tried to legislate how any particular
piece of textual data *must* be represented.
At the risk of verging into wacko territory again, if I
want to represent the Hebrew text of Genesis 1.1 in a Yi
cipher, I can.
All the Unicode Standard does is define a bunch of characters
and rules for how to use them interoperably for *plain text*
representation. If I get some stream of Unicode Tibetan
characters identified as Unicode plain text, then I can
presume to know what the content of that text is, by use
of the standard to interpret it. If I get some stream of Unicode
Tibetan characters, but my communicant has indicated to me
that they are using a protocol for the representation of
Elvish poetry in Tibetan characters, well, then I know it
isn't Tibetan plain text, and I can either use the protocol
to enjoy the Elvish poetry or dump the Tibetan characters
as uninterpretable.
Capiche?
--Ken
This archive was generated by hypermail 2.1.5 : Wed Jan 05 2005 - 13:24:38 CST