Re: Taiwanese: unicode of o with dot right above

From: Peter_Constable@sil.org
Date: Sat Aug 12 2000 - 08:20:40 EDT


Kiatgak:

Interesting questions! Here are my thoughts, for whatever they're worth:

>1. U+0186/U+0254 (LATIN CAPITAL/SMALL LETTER OPEN O)
> with alternative form in font design...

> but it induces different outlooks of OPEN O. Is that allowed or
>adequate?

Probably not a good idea because most fonts will use the real open o shapes
and so documents won't render on a receivers machine as the author intended
except with specific fonts. Not a good design for the web.

>2. U+004F/U+006F(O/o) + U+00B7(MIDDLE DOT)
> with the GSUB to fix the outlooks in font design.
>
> The problem is U+00B7 is not a combining character.

True. You may not need a combining character to solve this problem (I'll
provide another suggestion below) but it does make the most sense. This
solution has the comparable problems to the previous solution: need for
proprietary fonts.

> Another problem of the same reason:
> Is it a valid sequence if a combining character follows them, eg.
> U+004F/U+006F(O/o) + U+00B7(MIDDLE DOT) + U+0301(COMBINING
ACUTE
>ACCENT)
> Is such a solution allowed or adequate?

Such encoding is allowed, but the expected appearance is that acute will
position over the middle dot, and both of these to the right, beyond the
right side-bearing of the o.

>3. U+004F/U+006F(O/o) + U+05C1(HEBREW POINT SHIN DOT).

Here's an interesting solution! Technically, this is a legal encoding
sequence. A couple of issues, though: U+05C1 has a fixed combining class.
What this means is that combinations with other diacritics that go above or
above right that you might want to have typographic interactions with the
dot to signify different meanings (as would be in the case of tilde + acute
vs. acute vs. tilde) will not be distinguished in any normalised forms.
This may not be a problem for you. Secondly, U+05C1 in most fonts will have
certain behaviours associated with it, and likewise in rendering engines
such as Uniscribe. You're unlikely to get the exact position support you
want except with specific fonts, and then you've got to get the needed
cooperation of any required rendering engines such as Uniscribe. You might
not get the cooperation you need from implementers; you can always try, but
it's a gamble.

> One more serious problem: is a glyph with 2 scripts (Latin and Hebrew)
>allowed in unicode?

Nothing prevents it. It's not expected to be common, and so the issue is
one of whether implementers will design their systems to anticipate this.
Not high probability of that.

> Is it allow in Truetype?

TrueType doesn't make *any* rules about what character sequences are
allowed. But really your question should have been is it allowed in
OpenType or other smart-font rendering technologies, such as AAT. Again,
the character sequence would be allowed, but font developers aren't likely
to support it. And in the case of OpenType, I think it also takes
cooperation from something like Uniscribe (or the application software
itself) to activate appropriate features. Again, the likelihood of support
from implementers for this special treatment of shin dot is probably not
highly likely.

> Is script-language-feature structure adequate in Truetype?

You mean in OpenType, yes? A question for another list. (You could create
an AAT font or a Graphite font that handles this use of shin dot without
concern for feature issues, but you've got the problem of needing special
fonts to get the desired appearance.)

>4. U+004F/U+006F(O/o) + U+031B (COMBINING HORN) or precomposed ones
> U+01A0/U+01A1(LATIN CAPITAL/SMALL LETTER O WITH HORN).
>
> This solution is based on the similar outlooks.

The appearance vs linguistic meaning isn't a big deal: in general,
characters in Unicode don't have any explicit linguistic meaning attached.
Even characters in the IPA extensions could, in principle, be used in the
writing system of a given language to represent some other phone.

On the other hand, appearance vs. character semantics are an issue in that
they're all factors in what implementers do. So, as in all of the solutions
you've presented so far, there a question of whether you're likely to get
this language-specific usage supported. Again, it will not likely happen in
most fonts.

Before going on to 5, I'll mention one other possible alternative:
U+004F/U+006F(O/o) + U+02D9 DOT ABOVE. This is a spacing modifier, not a
combining character. Whether you get the desired appearance would depend
upon the size of the side-bearings for this glyph and the o/O glyphs and
the actual vertical position of the dot glyph in a given font, but there
isn't any dependence on smart rendering technologies. Now, you asked about
having a combining acute above the dot. Again, in this solution U+02D9 +
U+0301 would place the acute over the dot, not over the o/O. That may not
be what you need.

>5. To apply a new combining character.

This may be your best solution. You're probably more likely to find
implementers supporting one more combining diacritic (it's an incremental
change that doesn't need much in the way of special handling) than a marked
(i.e. uncommon) alternate behaviour for an existing character.

> It is a long long way to go (and maybe there is no end).

I take it you mean it will take some time to get this through the approval
process? Yes, but evan a couple of years isn't much if you consider the
lifespan of the solution: the representation adopted for this need will,
presumably, be used for many years to come.

> In fact, Te Khai-su and Michael Everson had applied on 1997-06-22, but
>their proposal
> was rejected(or withdrawn). But that proposal inquires many precomposed
>characters.

A proposal for precomposed characters would probably not get accepted
today. But a proposal for a COMBINING DOT ABOVE RIGHT probably has a
reasonable chance as long as the need can be established (the committees
will want to see that it will actually get used). There aren't any
problematic implementation issues that I can think of, so there shouldn't
be problems in that regard.

> If apply only a new combining character, will it be accepted?

My guess is that it will. I'd suggest you pursue that. In the mean time, if
you need to encode text and have to do something for this, use a
private-use character. You'll have to make your own fonts and input
methods, you won't get support for that character in commercial software,
and documents won't be generally interchangeable, but it will give you at
least a way to do something for the interim.

Hope this helps.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:07 EDT