Re:Canonical ordering

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon May 01 2000 - 19:00:49 EDT

Next message: Apurva Joshi: "RE: Encoding Bengali Vowel forms (again)"
Previous message: Peter Constable: "Re:Canonical ordering"
Maybe in reply to: Peter Constable: "Canonical ordering"
Next in thread: Peter Constable: "Re:Canonical ordering"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Peter,

I agree with Tim's general assessment of this issue.

You have to know the rendering rules for the writing system you are
working with.

> So, there's no list of these anywhere? Can anybody provide a
> list?

The ones everyone knows about are Vietnamese, Greek, and Hebrew.

I would expect that any language that made extensive use of more than
a single accent on top of a letter might have some history of
horizontal accomodations for the accents in its typography, however.
It is just the natural thing for typographers to do when trying to
create typefaces that work while having to deal with multiple
accents.

>
> You mentioned language tagging. I could certainly create an
> implementation in my software that positions the diacritics
> side-by-side, perhaps based on a language tag. But that doesn't
> mean that the text can be exchanged in a way that everybody
> knows that that's supposed to happen. A solution involving
> language tagging would have to assume that everybody who's ever
> going to read the text will know that the presence of that
> language tag implies different diacritic behaviour - i.e. that
> information is registered somewhere. Is that the best solution?

You really only would need to start language tagging if you are faced
with having to deal with aggressively multilingual text, for which
mixed conventions regarding accent stacking were significant and
required to be rendered correctly. Frankly I think that is a small
percentage case inside a small percentage case.

The options I see are:

1. A default renderer making no particular concessions to any particular
conventions: just stack all the accents according to the Unicode default
rules.

2. A default renderer which is aware of the common exceptions and wishes
to accomodate them: catalog the behavior for Vietnamese, Greek, and Hebrew,
and use appropriate triplets in the fonts (when dealing with combining
character sequences, at least) to display glyphs with preformed side-by-side
rendering as required.

3. A special-purpose renderer for particular languages: tune all combinations,
including the exact size and shape of accents and accent combinations.
(This is necessary, for example, to deal with the East European
typographic conventions versus the West European typographic conventions
for accents on letters.)

> What would it take to incorporate support for this purely in
> Unicode?

Unicode is not intended as a generic text layout macro language, though
there are those who keep trying to push it in that direction. *Some*
aspects of text layout need to be left to text markup and text
description languages. :-) And it isn't clear that trying to include
a plain text character mechanism for describing exactly how accents
are placed over a letter makes sense to include in the character
encoding per se.

--Ken

>
>
> Peter

>
> > On page 50 of U3, last paragraph, I read the following:
>
> > "Some specific nonspacing marks override the default
> > stacking behaviour by being positioned side-by-side rather
> > than stacking or by ligaturing with an adjacent nonspacing
> > mark."
>
> > How do I tell which this applies to?
>
> Sometimes the Unicode script description mentions it, other
> times knowledge of the writing system is needed.
> Greek and Hebrew immediately spring to mind.
>
> > If, suppose, there were some writing system in which
> > diacricits like acute and circumflex (say) occured, but the
> > could co-occur over the same base character and in that
> > situation were expected to be positioned side-by-side, what
> > would it take to handle that?
>
> Vietnamese uses the Latin script but doesn't behave as normal.
> See for example U+1EA7. Language tagging might be the best
> way around the problem. (Special forms of the accent seem to
> be frowned upon - see the deprecated U+0341.)
>
> Tim

Next message: Apurva Joshi: "RE: Encoding Bengali Vowel forms (again)"
Previous message: Peter Constable: "Re:Canonical ordering"
Maybe in reply to: Peter Constable: "Canonical ordering"
Next in thread: Peter Constable: "Re:Canonical ordering"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT