From: James E. Agenbroad (jage@loc.gov)
Date: Mon Sep 23 2002 - 11:22:51 EDT
On Fri, 20 Sep 2002, Kenneth Whistler wrote:
> Peter said:
> 
> > >This stuff *can* all be handled with appropriately designed
> > >ligations in fonts, so there are options for display:
> > >
> > ><U+0074, U+0361, U+0073, U+0307>
> > >
> > >   ==>
> > >   maps via ligation table to:
> > >
> > >{t-s-tie-ligature-with-dot-above} glyph
> > 
> > I would consider this an anomolous rendering. It is counter-exemplified by
> > figure 7-6 in TUS3.0. I'd be concerned of longer-term problems if we
> > decided to say that this was a valid alternate rendering from
> > 
> > >{t-s-dot-tie-ligature} glyph
> 
> Well, yes, it would be anomalous, which is why it would require
> somebody to go to the trouble to make a special ligation table
> entry for it.
> 
> But what longer-term problems are you talking about? I didn't
> say we should put in a formal rendering *rule* in the Unicode
> Standard that says something different from Figure 7-6, along
> the lines of converting one form to the other as above.
> 
> Look, let's consider again what problem we are trying to solve
> here. We have two funky forms from the ALA-LC transliteration
> tables, for which we haven't heard back yet from bibliographic
> sources whether there actually is any *actual* data representation
> problem in USMARC records.
> 
> We can try to invent and promulgate a generic rendering solution
> for these cases (and anything like them) in the Unicode Standard,
> despite the fact that they are an edge case of an edge case for
> Latin script rendering... Or, if it turns out that it isn't a
> general-enough problem to force everyone to deal with it in terms
> of generic rendering, we could suggest alternatives:
> 
>    a. markup solutions
>    b. specific font ligation solutions for specialized data
> 
> Now consider again the function of these things in the ALA-LC
> transliteration. The Cyrillic transliteration recommendations
> make rather extensive use of ligature ties. Why? Because the
> ALA-LC transliteration schemes make some effort to be round-trippable.
> In other words, the Cyrillic transliteration they recommend is
> not merely a useful romanization that might be in more general
> use, as for a newspaper, but is a romanization from which, in
> principle, you ought to be able to recover the Cyrillic it
> was transliterated from. Thus these schemes distinguish t-s
> from t-s-tie-ligature, since the ligated form might be a
> transliteration of a tse or similar letter, whereas the t-s
> would be a transliteration of a te+es, and so on. In other
> words, the tie-ligatures are being sprinkled in to make ad hoc
> digraphs for the transliteration, to aid in recovery of the
> Cyrillic from the romanization.
> 
> Now the dots above typically represent an articulatory diacritic,
> as for palatalization, or the like.
> 
> So the combination of the two is to indicate: we are transliterating
> a letter with a palatal (say) diacritic, using a digraph.
> 
> Do we have alternatives in Unicode for that? Well, yes, depending
> on whether the problem is:
> 
>   a. enabling exact transcoding of the USMARC data records
>      using ALA-LC romanization recommendations and the ANSEL
>      character set, for interoperability with Unicode systems.
> 
> or
> 
>   b. typesetting the ALA-LC romanization document guide in
>      Unicode, treating all the data therein as plain text and
>      using generic Unicode rendering rules.
> 
> I contend that the primary problem is a), and that we ought
> to examine the general usefulness of this dot-above-double-diacritic
> and related rendering, before we insist it has to be representable
> in plain text and go looking for an encoding solution and specify a
> bunch of rendering rules for it.
> 
> If the essential requirement here is to capture the data
> functionality of the transliteration: a roundtrippable form,
> with a palatal diacritic, using a digraph, we could suggest,
> for instance:
> 
> <U+0074, U+034F, U+0073, U+0307>
> 
> or
> 
> <U+0074, U+0307, U+034F, U+0073>
> 
> where we end up with an explicitly indicated digraph, with a
> dot-above diacritic (pick which letter you want it on), as
> a grapheme cluster. This is distinct from:
> 
> <U+0074, U+0073, U+0307>
> 
> or
> 
> <U+0074, U+0307, U+0073>
> 
> so you have your transliteration round-trippability intact.
> 
> And for your special-purpose application, which is a Unicode system
> to display USMARC bibliographic records using the ALA-LC romanization
> presentation conventions, you add ligation entries to your font
> so that
> 
> <U+0074, U+034F, U+0073, U+0307>
> 
> and similar forms using a U+034F GRAPHEME JOINER display with a
> visible tie-ligature, rather than nothing, despite the fact that
> no U+0361 double diacritic is being used in the data. Problem
> solved.
> 
> Of course, that doesn't mean that your converted USMARC data
> records involving digraphs for Cyrillic transliteration will
> display with the tie-ligature in a generic web application using
> off-the-shelf fonts -- but is that the problem we are trying
> to solve here? I doubt it. The forms would be legible -- perhaps
> more legible without the obtrusive ties cluttering them up --
> and the data distinctions would still be preserved in such
> contexts.
> 
> --Ken
> 
> 
     Regards,
          Jim Agenbroad ( jage@LOC.gov )
     "It is not true that people stop pursuing their dreams because they
grow old, they grow old because they stop pursuing their dreams." Adapted
from a letter by Gabriel Garcia Marquez.
     The above are purely personal opinions, not necessarily the official
views of any government or any agency of any.
     Addresses: Office: Phone: 202 707-9612; Fax: 202 707-0955; US
mail: I.T.S. Sys.Dev.Gp.4, Library of Congress, 101 Independence Ave. SE, 
Washington, D.C. 20540-9334 U.S.A.
Home: Phone: 301 946-7326; US mail: Box 291, Garrett Park, MD 20896.  
This archive was generated by hypermail 2.1.5 : Mon Sep 23 2002 - 12:08:41 EDT