From: Peter Kirk (peterkirk@qaya.org)
Date: Mon Nov 08 2004 - 18:47:07 CST
On 08/11/2004 20:06, Edward H. Trager wrote:
>...
>
>While the Unicode code space is by definition mathematically finite, still it is
>for all practical intents and purposes a very large code space that should be
>able to incorporate the "legitimate needs" of scholars, researchers, historians,
>among others. Regardless of whether one agrees completely or not about the encoding
>of Phoenecian in Unicode, I --perhaps naively I admit-- fail to see how it does
>any more harm than the encoding of that HUGE number of "CJK Unified Ideographs
>Extension B" which, as far as I can tell (given my lack of scholarship in this area),
>is of more use to esoteric scholars
>than it is to ordinary speakers and writers of Chinese, Japanese, or Korean.
>It is no worse than the encoding of a large number of Arabic ligatures --a clear
>case of encoding glyphs, not characters-- that occurred in Unicode to support legacy
>systems that had already been defined for Arabic at the time when Unicode came around.
>Thankfully a similar thing did not happen for, say, Syriac. It is no worse than
>the encoding of Hangul syllables.
>
>I don't closely follow what additional planes of Unicode are being designated
>for, but perhaps there should be a plane set aside for the encoding of historical
>"script nodes" that would be useful to scholars, but not as useful to others.
>Then again, perhaps I'm too naive in this area to know what I'm talking about ... ;-)
>
>
>
Thank you for your mostly helpful comments.
But I would like to address your argument that it does no harm to add
additional characters which people can use or not use as they please. I
would like to disagree, as a general principle. The aim of Unicode
standardisation is surely to define a single and unambiguous
representation of text. That requires that there be a single code point
for each character, or perhaps a set of canonically equivalent
representations. Where for historical reasons there are alternative
representations e.g. Arabic presentation forms, use of them is clearly
(though sometimes not clearly enough) deprecated, and anyway they
usually have canonical decompositions. But if we get into the position
where there is more than one (not canonically equivalent) way of
representing the same text, we are moving quickly away from
standardisation. There may be good reasons for some departures, but the
impact of these will be minimised by mechanisms like compatibility
decompositions and folding together for collation. But the suggestion of
encoding alternative representations for variant forms of scripts for
use alongside the original ones is likely to lead rapidly to chaos.
Imagine for example if Fraktur were defined as a "historical script
node" on your scheme, for use by scholars only. The result would be that
some scholars would encode texts with the special Fraktur characters,
but others as well as the general public would encode them as currently
as glyph variants of Latin script. The result would quickly be chaos.
... (omitted by request)
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Mon Nov 08 2004 - 18:53:41 CST