Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)

From: James E. Agenbroad ([email protected])
Date: Mon Sep 23 2002 - 11:22:51 EDT

Next message: James E. Agenbroad: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"

Previous message: [email protected]: "Re: entities with breve"
In reply to: Kenneth Whistler: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Next in thread: James E. Agenbroad: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Fri, 20 Sep 2002, Kenneth Whistler wrote:

> Peter said:
>
> > >This stuff *can* all be handled with appropriately designed
> > >ligations in fonts, so there are options for display:
> > >
> > ><U+0074, U+0361, U+0073, U+0307>
> > >
> > > ==>
> > > maps via ligation table to:
> > >
> > >{t-s-tie-ligature-with-dot-above} glyph
> >
> > I would consider this an anomolous rendering. It is counter-exemplified by
> > figure 7-6 in TUS3.0. I'd be concerned of longer-term problems if we
> > decided to say that this was a valid alternate rendering from
> >
> > >{t-s-dot-tie-ligature} glyph
>
> Well, yes, it would be anomalous, which is why it would require
> somebody to go to the trouble to make a special ligation table
> entry for it.
>
> But what longer-term problems are you talking about? I didn't
> say we should put in a formal rendering *rule* in the Unicode
> Standard that says something different from Figure 7-6, along
> the lines of converting one form to the other as above.
>
> Look, let's consider again what problem we are trying to solve
> here. We have two funky forms from the ALA-LC transliteration
> tables, for which we haven't heard back yet from bibliographic
> sources whether there actually is any *actual* data representation
> problem in USMARC records.
>
> We can try to invent and promulgate a generic rendering solution
> for these cases (and anything like them) in the Unicode Standard,
> despite the fact that they are an edge case of an edge case for
> Latin script rendering... Or, if it turns out that it isn't a
> general-enough problem to force everyone to deal with it in terms
> of generic rendering, we could suggest alternatives:
>
> a. markup solutions
> b. specific font ligation solutions for specialized data
>
> Now consider again the function of these things in the ALA-LC
> transliteration. The Cyrillic transliteration recommendations
> make rather extensive use of ligature ties. Why? Because the
> ALA-LC transliteration schemes make some effort to be round-trippable.
> In other words, the Cyrillic transliteration they recommend is
> not merely a useful romanization that might be in more general
> use, as for a newspaper, but is a romanization from which, in
> principle, you ought to be able to recover the Cyrillic it
> was transliterated from. Thus these schemes distinguish t-s
> from t-s-tie-ligature, since the ligated form might be a
> transliteration of a tse or similar letter, whereas the t-s
> would be a transliteration of a te+es, and so on. In other
> words, the tie-ligatures are being sprinkled in to make ad hoc
> digraphs for the transliteration, to aid in recovery of the
> Cyrillic from the romanization.
>
> Now the dots above typically represent an articulatory diacritic,
> as for palatalization, or the like.
>
> So the combination of the two is to indicate: we are transliterating
> a letter with a palatal (say) diacritic, using a digraph.
>
> Do we have alternatives in Unicode for that? Well, yes, depending
> on whether the problem is:
>
> a. enabling exact transcoding of the USMARC data records
> using ALA-LC romanization recommendations and the ANSEL
> character set, for interoperability with Unicode systems.
>
> or
>
> b. typesetting the ALA-LC romanization document guide in
> Unicode, treating all the data therein as plain text and
> using generic Unicode rendering rules.
>
> I contend that the primary problem is a), and that we ought
> to examine the general usefulness of this dot-above-double-diacritic
> and related rendering, before we insist it has to be representable
> in plain text and go looking for an encoding solution and specify a
> bunch of rendering rules for it.
>
> If the essential requirement here is to capture the data
> functionality of the transliteration: a roundtrippable form,
> with a palatal diacritic, using a digraph, we could suggest,
> for instance:
>
> <U+0074, U+034F, U+0073, U+0307>
>
> or
>
> <U+0074, U+0307, U+034F, U+0073>
>
> where we end up with an explicitly indicated digraph, with a
> dot-above diacritic (pick which letter you want it on), as
> a grapheme cluster. This is distinct from:
>
> <U+0074, U+0073, U+0307>
>
> or
>
> <U+0074, U+0307, U+0073>
>
> so you have your transliteration round-trippability intact.
>
> And for your special-purpose application, which is a Unicode system
> to display USMARC bibliographic records using the ALA-LC romanization
> presentation conventions, you add ligation entries to your font
> so that
>
> <U+0074, U+034F, U+0073, U+0307>
>
> and similar forms using a U+034F GRAPHEME JOINER display with a
> visible tie-ligature, rather than nothing, despite the fact that
> no U+0361 double diacritic is being used in the data. Problem
> solved.
>
> Of course, that doesn't mean that your converted USMARC data
> records involving digraphs for Cyrillic transliteration will
> display with the tie-ligature in a generic web application using
> off-the-shelf fonts -- but is that the problem we are trying
> to solve here? I doubt it. The forms would be legible -- perhaps
> more legible without the obtrusive ties cluttering them up --
> and the data distinctions would still be preserved in such
> contexts.
>
> --Ken
>
>

     Regards,
          Jim Agenbroad ( [email protected] )
     "It is not true that people stop pursuing their dreams because they
grow old, they grow old because they stop pursuing their dreams." Adapted
from a letter by Gabriel Garcia Marquez.
     The above are purely personal opinions, not necessarily the official
views of any government or any agency of any.
     Addresses: Office: Phone: 202 707-9612; Fax: 202 707-0955; US
mail: I.T.S. Sys.Dev.Gp.4, Library of Congress, 101 Independence Ave. SE,
Washington, D.C. 20540-9334 U.S.A.
Home: Phone: 301 946-7326; US mail: Box 291, Garrett Park, MD 20896.

Next message: James E. Agenbroad: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Previous message: [email protected]: "Re: entities with breve"
In reply to: Kenneth Whistler: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Next in thread: James E. Agenbroad: "Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Sep 23 2002 - 12:08:41 EDT