From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Thu Mar 30 2006 - 16:38:09 CST
On 3/30/2006 1:58 AM, Andrew West wrote:
> On 29/03/06, Kent Karlsson <kent.karlsson14@comhem.se> wrote:
>
>>> The sort of case I am thinking of is that in which a letter L may have
>>> two contextual forms, L1 and L2 which are selected in different
>>> contexts (e.g. L1 before one set of vowels and L2 before another set
>>> of vowels). However, when writing a foreign word L2 is always used,
>>> regardless of context.
>>>
>> You are convincing me even more that these variants should have
>> been encoded as separate characters, that should have separate
>> shaping properties.
>>
>
> There are some things about the Mongolian encoding model that I really
> do not like, and which I think go against Unicode's fundamental
> encoding principles, but variation selectors are not one of them. Your
> suggestion of encoding contextual glyph variants separately goes
> against both the character-glyph model and the Mongolian's own sense
> of what letters their script is composed of.
>
That defines the issue very clearly.
> Just to reiterate, variation selectors for Mongolian are used sparsely
> in ordinary running text as the rendering system can select the
> correct glyph form of a letter from context in most cases, and the
> user (or IME) only needs to enter a VS when the context is ambiguous
> or needs to be overridden.
>
This is no different from the use of ZWNJ in Persian to get a
disconnected shape. The character
glyph model correctly distinguishes between underlying entities
(characters) and their shapes
(glyphs), whether these shapes are selected by font switching, or by
complex shaping algorithm.
Algorithms other than display, tend to process text based on the
underlying entities, not their
surface appearance; therefore, using variation selectors (or other
modifiers, such as ZWNJ)
allows those algorithms to arrive at the correct results without costly
mapping tables, but by
simply treating the modifier as transparent.
If the greek small letter final sigma had been realized with some sort
of modifier, either a VS
or perhaps even ZW(N)J, instead of being encoded separately, it would
be possible to
round-trip words between lower and upper case.
Now, for Greek, where the history of computer implementations is firmly
rooted in 8-bit,
single language implementations, that would have meant dragging a lot of
complexity into
editing and rendering for a single character.
(There's also the issue that this shape has been used contrastively in
various notations that
simply plundered the Greek type case, which makes a VS based approach
less useful).
For Mongolian, which not only has many characters affected by this
issue, but already
has one of the most complex shaping algorithms and needed de-novo
implementations
of rendering software in any case, those arguments don't apply and FVS
is much preferable over burdening all the other algorithms with
unnecessary complexity.
In fact the FVS solution makes it possible for most generic
implementation of algorithms
to handle Mongolian data w/o having to carry a mapping table.
> Incidentally, there are a couple of cases for Mongolian where
> variation selectors are used to select simple glyph variants, which I
> agree should better have been encoded as separate characters:
>
> U+1880 MONGOLIAN LETTER ALI GALI ANUSVARA ONE
> U+1881 MONGOLIAN LETTER ALI GALI VISARGA ONE
>
> In fact, I think that the spurious "ONE" in the names of these
> characters must be a relic of an early draft which included MONGOLIAN
> LETTER ALI GALI ANUSVARA ONE, MONGOLIAN LETTER ALI GALI ANUSVARA TWO,
> MONGOLIAN LETTER ALI GALI VISARGA ONE and MONGOLIAN LETTER ALI GALI
> VISARGA TWO (just my hypothesis, but if Ken or anyone can confirm or
> deny it ...).
>
I can't confirm that, but this reasoning is plausible. These are not of
the same nature
as the overridden automatic shapes.
>> It's not really too late yet, I think, to deprecate
>> the FVSs..
>>
>
> Well, yes it is.
>
>
I would have to firmly agree with Andrew on this conclusion. For the use
in overriding
automatic shaping, there's no way that we would deprecate the FVS.
However, we have never ruled out adding additional character codes in
cases where
there is a *semantic* difference between two variant shapes. For
example, should
some of the mathematical variants become used in a (future) development
of notation
such that they acquire strongly contrastive meaning, we reserve the
right to add a
character code with the same representative shape as was previously
covered by
a variation sequence. This flexibility is necessary to guarantee that
algorithms can
continue to *ignore* VS (and FVS).
Users relying on strong semantic differentiation would therefore need an
actual character
code, and we need to affirm our right to be able to accommodate future
users in that
regard. However, existing documents would simply continue to display
correctly,
but for them, the alternate shape would not carry the same semantic
distinction.
Ordinary variation selectors are a solution to a coding problem: what to
do when the
nature of the use of glyph variants is not well known, or ill-defined:
they allow
the (possibly) stylistic difference to be marked in the text, in case
that there is
a semantic difference to the reader.
This is not the same problem as overriding automatic shaping, which is
the primary
role of the FVS. Here, there is normally no semantic difference, but
there may be
an orthographic distinction (foreign words). In these cases *some* Mongolian
specific algorithms (spell-checkers, etc.) would need to process the
FVS, while
many others (sorting, etc.) would have no need.
A./
This archive was generated by hypermail 2.1.5 : Thu Mar 30 2006 - 16:40:32 CST