Re: How to make "oo" with combining breve/macron over pair?

From: Peter_Constable@sil.org
Date: Wed Mar 06 2002 - 11:11:16 EST


On 03/05/2002 08:00:58 PM Kenneth Whistler wrote:

>Actually, I am finding myself attracted to the parsimony of this
>approach.

Parsimony? Thinking in terms of formal grammars and formal languages, it's
a simple mechanism that overgenerates big time. Not everyone would call
that parsimony. Isn't it a powerful mechanism that deals with a very small
problem? A howitzer to shoot two rabbits? As Rick has suggested, how many
double diacritics are we *really* likely to encounter? (Or are we
considering this so that people will be able to invent new ways to notate
things in writing?) And how many triple, quadruple, n-tuple (n > 4)
diacritics are we *really* likely to encounter?

>1. Rendering applications already have to deal with combining
> enclosing marks (well, at least if they choose to support them).

That qualifier is pretty significant here. I can't imagine too many font
developers getting terribly excited about implementing U+20DD to enclose
more than one preceding character, for example. Font developers will
implement multi-character enclosing marks for Arabic because (i) they know
that these really are needed, and (ii) they know that there is a
well-contained limit to what length spans they have to accomodate. But if
you ask those font developers to implement a combining tilde that can span
up to eight base characters, I think you'll get a very cool response (or
else an earful of laughter).

Sure, it's a slick idea. But it seems to be a solution begging for a need.

>On the downside, it might be awhile before rendering engines
>and font definitions really catch up to it.

If you demonstrated a need for particular double diacritics, you'd
probably get implementations before too long. But you'd need to spell out
*exactly* what diacritics were involved -- font developers aren't going to
go inventing typographic oddities of their own volition. And don't expect
spans longer than two base characters unless you can come up with specific
needs that clearly point to a text-based solution rather than a
general-graphics-based solution. I suspect that the list of clear needs
you could come up with are very short -- a handful at best. And if it's so
few, why not just encode them directly rather than create a generative
mechanism that's never going to be used except in very limited ways.

>That is, the whole
>notion of "adjusting" a diacritic to apply to an enclosure is
>fairly sophisticated, since it may involve context-dependent
>rules and arbitrary shape modifications -- not merely moving
>a glyph origin point based on a preceding glyph's metrics.

Not "may involve", but "will involve". That's why font developers are only
likely to implement this mechanism for a very small set of documented
needs.

>On the other hand, hacked up fonts for limited dictionary
>usage could be rather quick and easy. For the old Webster's
>pronunciation guides, the entities are really the oomacr
>and oobreve shown in the examples that started this thread.
>Simply preform those entities as glyphs in a font, and map them
>to <o, CGJ, o, CGJ, combining_macron> and
>to <o, CGJ, o, CGJ, combining_breve> respectively. Presto,
>you have a Unicode representation for the text, and a
>reliable font rendering for them, without any fancy-dancing
>about dynamic positional adjustments.

And if it's only those two things that have any likelihood of getting
implemented in fonts, doesn't it make more sense to encode those two
diacritics rather than create a hypergenerative mechanism that will be
ignored?

>The fallback rendering,
>in applications and fonts not wise to the CGJ rules would
>be {o o-macron} and {o o-breve}, which while not exact,
>is at least comprehensible and close enough for gummint work.

Or the fallback mechanism would be {o o box-macron} since TUS3.2 has
basically told font implementers that CGJ between a base and a combining
mark indicates bad data, and since fonts developed prior to TUS3.2 could
map CGJ to .notdef anyway.

I don't generally find myself arguing against generative mechanisms. I
won't be as horrified as Rick if this gets implemented, but I'm inclined
to agree with him that I don't think it's really needed. If it is going to
happen, I'd suggest it get done at the next meeting while people are still
working on implementing 3.2 CGJ behaviour and the new end-of-ayah type of
behaviour. Or let's just encode a double macron and double breve and move
on.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Wed Mar 06 2002 - 11:17:38 EST