From: Kenneth Whistler (kenw@sybase.com)
Date: Thu May 20 2004 - 18:51:03 CDT
Patrick said:
> >In this case, I think it's important to be picky because there are
> >no current Unicoding practices for Phoenician.
> >
> You may mean that the Unicode book does not document how Phoenician (or
> Paleo-Hebrew) may be encoded. This is not to say that no one is using
> Unicode to encode Paleo-Hebrew texts.
^^^^^^
represent
I like to distinguish this, because the whole notion of
what it means to "encode a text" tends to derail the discussion
immediately.
The Unicode Standard *encodes* abstract characters.
There are many potential abstract characters, but one of the
general principles used is that each significant "letter" (grapheme)
from a *script* will be encoded once as a character in the
standard. That, of course, begs the question of identifying
the "script" and its exact repertoire of "letters". The identification
of the "script" is what the Phoenician argument has been about,
since there is no serious question about the repertoire of
"letters" for it.
Once a repertoire of abstract characters has been *encoded*
in the Unicode Standard, those encoded characters can then
be used to *represent* the plain text content of documents.
This is deliberately different from talking about "encoding the
text", because people don't have common understandings about
what that means, and often expect various aspects of format
and appearance to also be "encoded" -- hence the way these
discussions tend to veer off into ditches.
Now returning to Patrick's statement and substituting for a
different unencoded script:
> the Unicode standard does not document how *Avestan*
> may be encoded. This is not to say that no one is using
> Unicode to represent *Avestan* texts.
Also true, right? Or...
> the Unicode standard does not document how *Tifinagh*
> may be encoded. This is not to say that no one is using
> Unicode to represent *Tifinagh* texts.
O.k., I guess you can see that this particular argument is not
going to go anywhere. Any script which is not currently encoded
in the standard can be (and probably is) represented *somehow*
by Unicode characters, either via PUA or transliteration or
some other arbitrary intermediate encoding of entities. That it
is (or could be) so represented has little or no bearing on the
question of whether the script in question is or is not
distinct enough from some already encoded but historically
related script to warrant a distinct encoding as a "script" in
the Unicode sense of a script.
John Hudson asked, again:
> My question, again, is whether there is a need for the plain
> text distinction in the first place?
And I claim that there is no final answer for this question. We
simply have irresolvable differences of opinion, with some
asserting that it is self-evident that there is such a need,
and others asserting that it is ridiculous to even consider
encoding Phoenician as a distinct script, and that there is
no such need.
My own take on this seemingly irreconcilable clash of opinion is
that if *some* people assert a need (and if they seem to be
reasonable people instead of crackpots with no demonstrable
knowledge of the standard and of plain text) then there *is*
a need. And that people who assert that there is *no* need
are really asserting that *they* have no need and are making
the reasonable (but fallacious) assumption that since they
are rational and knowledgable, the fact that *they* have no
need demonstrates that there *is* no need.
If such is the case, then there *is* a need -- the question
then just devolves to whether the need is significant enough
for the UTC and WG2 to bother with it, and whether even if
the need is met by encoding of characters, anyone will actually
implement any relevant behavior in software or design fonts
for it.
In my opinion, Phoenician as a script has passed a
reasonable need test, and has also passed a significant-enough-
to-bother test.
Note that these considerations need to be matters of
reasonableness and appropriateness. There is no absolutely
correct answer to be sought here. A character encoding standard
is an engineering construct, not a revelation of truth, and
we are seeking solutions that will enable software handling
text content and display to do reasonable things with it at
reasonable costs.
If you start looking for absolutes here, it is relatively easy
to apply reductio ad absurdum. In an absolute sense, there is
no "need" to encode *any* other script, because they can *all*
be represented by one or another transliteration scheme or
masquerading scheme and be rendered with some variety or
other of symbol font encoding. After all, that's exactly what
people have been doing to date already for them -- or they
are making use of encodings outside the context of Unicode,
which they could go on using, or they are making use of graphics
and facsimiles, and so on. The world wouldn't end if all such
methods and "hacks" continued in use.
The question is rather, given the fundamental nature of the
Unicode Standard as enabling text processing for modern
software, it is cost-effective and *reasonable* to provide
a Unicode encoding for one particular script or another,
unencoded to date, so as to maximize the chances that it
will be handled more easily by modern software in the global
infrastructure and to minimize the costs associated with
doing so.
*That* is the test which should be applied when trying to
make decisions about which of the remaining varieties of
unencoded writing systems rise to the level of distinctness,
utility, and cost-effectiveness to be encoded as another
script in the standard.
--Ken
This archive was generated by hypermail 2.1.5 : Thu May 20 2004 - 18:52:07 CDT