Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Sep 20 2002 - 20:07:30 EDT

Next message: Peter_Constable@sil.org: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"

Previous message: Peter_Constable@sil.org: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Maybe in reply to: William Overington: "Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Next in thread: James E. Agenbroad: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Reply: James E. Agenbroad: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Reply: James E. Agenbroad: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Peter said:

> >This stuff *can* all be handled with appropriately designed
> >ligations in fonts, so there are options for display:
> >
> ><U+0074, U+0361, U+0073, U+0307>
> >
> > ==>
> > maps via ligation table to:
> >
> >{t-s-tie-ligature-with-dot-above} glyph
>
> I would consider this an anomolous rendering. It is counter-exemplified by
> figure 7-6 in TUS3.0. I'd be concerned of longer-term problems if we
> decided to say that this was a valid alternate rendering from
>
> >{t-s-dot-tie-ligature} glyph

Well, yes, it would be anomalous, which is why it would require
somebody to go to the trouble to make a special ligation table
entry for it.

But what longer-term problems are you talking about? I didn't
say we should put in a formal rendering *rule* in the Unicode
Standard that says something different from Figure 7-6, along
the lines of converting one form to the other as above.

Look, let's consider again what problem we are trying to solve
here. We have two funky forms from the ALA-LC transliteration
tables, for which we haven't heard back yet from bibliographic
sources whether there actually is any *actual* data representation
problem in USMARC records.

We can try to invent and promulgate a generic rendering solution
for these cases (and anything like them) in the Unicode Standard,
despite the fact that they are an edge case of an edge case for
Latin script rendering... Or, if it turns out that it isn't a
general-enough problem to force everyone to deal with it in terms
of generic rendering, we could suggest alternatives:

a. markup solutions
b. specific font ligation solutions for specialized data

Now consider again the function of these things in the ALA-LC
transliteration. The Cyrillic transliteration recommendations
make rather extensive use of ligature ties. Why? Because the
ALA-LC transliteration schemes make some effort to be round-trippable.
In other words, the Cyrillic transliteration they recommend is
not merely a useful romanization that might be in more general
use, as for a newspaper, but is a romanization from which, in
principle, you ought to be able to recover the Cyrillic it
was transliterated from. Thus these schemes distinguish t-s
from t-s-tie-ligature, since the ligated form might be a
transliteration of a tse or similar letter, whereas the t-s
would be a transliteration of a te+es, and so on. In other
words, the tie-ligatures are being sprinkled in to make ad hoc
digraphs for the transliteration, to aid in recovery of the
Cyrillic from the romanization.

Now the dots above typically represent an articulatory diacritic,
as for palatalization, or the like.

So the combination of the two is to indicate: we are transliterating
a letter with a palatal (say) diacritic, using a digraph.

Do we have alternatives in Unicode for that? Well, yes, depending
on whether the problem is:

  a. enabling exact transcoding of the USMARC data records
     using ALA-LC romanization recommendations and the ANSEL
     character set, for interoperability with Unicode systems.

  b. typesetting the ALA-LC romanization document guide in
     Unicode, treating all the data therein as plain text and
     using generic Unicode rendering rules.

I contend that the primary problem is a), and that we ought
to examine the general usefulness of this dot-above-double-diacritic
and related rendering, before we insist it has to be representable
in plain text and go looking for an encoding solution and specify a
bunch of rendering rules for it.

If the essential requirement here is to capture the data
functionality of the transliteration: a roundtrippable form,
with a palatal diacritic, using a digraph, we could suggest,
for instance:

<U+0074, U+034F, U+0073, U+0307>

<U+0074, U+0307, U+034F, U+0073>

where we end up with an explicitly indicated digraph, with a
dot-above diacritic (pick which letter you want it on), as
a grapheme cluster. This is distinct from:

<U+0074, U+0073, U+0307>

<U+0074, U+0307, U+0073>

so you have your transliteration round-trippability intact.

And for your special-purpose application, which is a Unicode system
to display USMARC bibliographic records using the ALA-LC romanization
presentation conventions, you add ligation entries to your font
so that

<U+0074, U+034F, U+0073, U+0307>

and similar forms using a U+034F GRAPHEME JOINER display with a
visible tie-ligature, rather than nothing, despite the fact that
no U+0361 double diacritic is being used in the data. Problem
solved.

Of course, that doesn't mean that your converted USMARC data
records involving digraphs for Cyrillic transliteration will
display with the tie-ligature in a generic web application using
off-the-shelf fonts -- but is that the problem we are trying
to solve here? I doubt it. The forms would be legible -- perhaps
more legible without the obtrusive ties cluttering them up --
and the data distinctions would still be preserved in such
contexts.

--Ken

Next message: Peter_Constable@sil.org: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Previous message: Peter_Constable@sil.org: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Maybe in reply to: William Overington: "Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Next in thread: James E. Agenbroad: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Reply: James E. Agenbroad: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Reply: James E. Agenbroad: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Sep 20 2002 - 20:51:45 EDT