From: Peter Kirk (peterkirk@qaya.org)
Date: Sat Jun 05 2004 - 12:17:41 CDT
On 05/06/2004 08:25, John Hudson wrote:
> Peter Kirk wrote:
>
>>> All Hudson is pointing out is that long PRIOR to Unicode, Semitic
>>> scholars reached the conclusion all Semitic languages share the same
>>> 22 characters. A long standing and quite useful conclusion that has
>>> nothing at all to do with your proposal.
>>
>
>> But I dispute his last sentence. If the writing systems of these
>> languages share the same abstract characters, they form a single
>> script, which conflicts with the proposal to encode Phoenician as a
>> separate script.
>
>
> Did you read, also, my messages regarding the perception of instances
> of a script continuum? Restating your perception that the instances of
> Phoenician and Hebrew represent the same 'script' for Unicode purposes
> is just reverting to the fundamental disagreement with those who have
> stated a desire or need to distinguish such instances in plain text.
> 'Script' in Unicode is a generic term that does not necessarily relate
> to notions of script outside Unicode. The determining feature of a
> Unicode script, i.e. a labelled subset of characters, is that it is
> something that can be differentiated from other subsets of characters
> *in plain text*. Whether things so-differentiated are considered
> individual scripts outside of Unicode isn't very relevant to this
> usage. Indeed, Unicode might have avoided all this debate by not using
> the term script at all.
Well, I tend to agree that the word "script" has not helped. It doesn't
help that the definition you use here conflicts with the one Michael
Everson uses when he insists that Phoenician is a separate script. On
your definition it is clearly not one until the UTC defines that it is.
So we end up with a circular argument.
On your definition, the set of fullwidth forms FF01-FF5E is a separate
script, because it is a labelled subset of characters which can be
differentiated from any other such set in plain text. So are each of the
subsets of mathematical alphanumeric symbols. But they have
compatibility decompositions to regular Latin script. If these are
separate scripts, I might accept that Phoenician should also be one. But
Ken Whistler disagrees: he wrote yesterday "These are not separate scripts."
So let's drop "script" for now. My basic contention is that each letter
of the Phoenician abjad is not a separate abstract character, but that
it and the corresponding square Hebrew letter are glyph variants of the
same abstract characters. And this is clearly the understanding of
Semitic scholars, as summarised by Patrick Durusau and quoted above. On
the other hand, nearly everyone agrees that there should be a mechanism
for distinguishing them in plain text.
Is this a novel situation? No, for Unicode has clearly recognised this
kind of situation in TUS section 15.6 which I quoted earlier. And
Unicode has defined a mechanism for dealing with the situation,
variation selectors. If this mechanism is not appropriate in this
particular case, let the UTC come up with another mechanism to meet the
user requirement. To define a new set of abstract characters for what
are actually glyph variants is to ignore the character-glyph model.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Sat Jun 05 2004 - 12:19:00 CDT