RE: Phetsarat font, Lao unicode

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Jul 11 2007 - 09:55:09 CDT

Next message: Aiet Kolkhi: "Re: that font-letter spacing problem - MS Word issue?"

Previous message: Tom Gewecke: "OS X Georgian"
In reply to: James Kass: "RE: Phetsarat font, Lao unicode"
Next in thread: Brian Wilson: "Re: Phetsarat font, Lao unicode"
Reply: Brian Wilson: "Re: Phetsarat font, Lao unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

James Kass wrote:
> Envoyé : mardi 10 juillet 2007 05:16
> À : unicode@unicode.org
> Objet : RE: Phetsarat font, Lao unicode
>
>
> Philippe Verdy wrote,
>
> > One problem is that fonts (at least with TrueType/OpenType) are not
> designed
> > to support reordering and positioning with an unbound number of base
> > characters.
>
> Font engines handle reordering.

Not completely, and not always. Some fonts do have to use GSUB for local
reordering according to style rather than just the script properties.

> > For example the GSUB/GPOS tables in TrueType require listing
> > somewhere the complete list of codepoints where such reordering and
> > positioning may be applied, ...
>
> A listing of glyph IDs is stored in the font. Fonts only store
> codepoints in the "cmap" table. The listing of glyph IDs may
> be a complete list of every glyph ID involved, or it may be
> done using ranges in order to minimize table size.
> > ... something that can't be performed in fonts with
> > the current format, because they don't allow defining character classes
> > in them,
>
> The OpenType GDEF table format requires assignment of
> glyphs to various character classes. These classes are neither
> user- nor developer-definable, though. Unicode also assigns
> character classes, but only to characters. Complex script
> fonts generally have scads of "presentation form" glyphs
> which aren't characters in the Unicode sense.

I said "somewhere". You misunderstand what I mean here. I was speaking about
the possibility of creating a group of code points (even if they are
remapped internally to glyph ids within a "cmap" table or other tables) and
assigning them with a single identifier that can be used in GSUB/GPOS rules
tables; without it, you'll have to create asmany rules as there are in the
product of possible base characters in one class, and of possible combining
vowel signs in another class. As there may exist lots of candidate base
characters to which such combination will be needed, this will rapidly
exhaust the maximum size allowed for such GSUB/GPOS tables.
Creating GSUB/GPOS tables so that their selector can include a pseudo-glyph
id mapped to a class of codepoints wouldsimplify the design a lot for fonts
that need to contain lots of characters (possibly from several or many
scripts);

> > ... and assigning them pseudo-glyph IDs that can be used in GSUB tables.
>
> Pseudo-glyph ID might be a misleading phrase. A Glyph ID is
> simply the number of the position of a glyph's data in a font.
> The first glyph, contrary to conventional counting methods,
> is given the glyph ID of zero. And so forth.

It was not misleading. I really intended a special id that can be used to
designate a class of glyphs (mapped from a class of characters) as if it was
a single glyph id, to create a single composition rule, instead of having
one composition rule per result of the product of the two classes. It would
certainly be more useful in GPOS than in GSUB.

> > ... the renderer
> > for example could be looking for rules based on the dotted circle
> symbol,
> > and automatically infer the other applicable rules for other Common
> symbols,
>
> Does this assume that the dotted circle is part of the encoded text?

Yes, for the intended purpose of showing the diacritic isolately, with an
arbitrary base symbol.

> It normally isn't, it's inserted (to the display only) by (at least) one
> popular font engine.

Not. A renderer should not have to do this unless explicitly instructed to
do so, or if there's no other way to display the diacritic in combination
with a associatable base character.

But even in the case of, for example, a combining cedilla occurring after a
base Hebrew letter, for which it is very unlikely that a font would
implement a composition rule, and for which the renderer will be of no help,
displaying the uncomposable combining cedilla with a dotted circle is not
the ultimate solution. Many renderers will instead attempt to use some
default reasonable positioning, for example by centering the diacritic
horizontally with the center of the base letter (the renderer will probably
not be able to move the cedilla vertically, or fond a more appropriate
place, given that it would depend on the exact style of each base glyph,
which does not necessarily specify attachment points for general Latin
diacritics)

> Regardless, other symbols will most always
> have completely different metrics. It's unlikely that a font engine
> will calculate the different heights, advance widths, and so forth,
> in order to approximate a correct placement of the combining
> character glyph. It's probably equally unlikely that a font developer
> will add a potentially infinite number of GPOS rules to a font's tables
> in order to accomplish this with every conceivable arbitrary base
> character glyph.

For notational purpose, this is still what renderers are doing when
positioning diacritics with a dotted circle, given that fonts themselves are
not specifying such advanced positioning (or substitution for resizing)

What is the difference between positioning a diacritic (like a Lao vowel)
with a base dotted circle, and positioning the same diaciitic with another
base symbol like a cross (or something else like a circle, square, dotted
square, crossed hatch, horizontal stroke, or checkers grid)? I've seen
various symbols used to denote the absence of a specific base letter in
Latin-written texts. Why would not this exist too for Lao?

Is the proposer of the x-like cross sure that this convention is not
arbitrary and specific to some authors? What is clear is that the chosen
symbol should not be confusable with another existing letter (that's why
choosing a simple circle or a cross was not appropriate as the base symbol
for denoting the position of a base Latin or Greek or Cyrilic letter).

But could the Unicode convention of using a dotted circle for such
notational use the best option for all scripts? Isn't there a script where a
dotted circle character gets another semantic than just a pure symbolic
graphical feature, so that the conventional dotted circle could become
confusable in that script? I have not seen something in Unicode that says
that using a dotted circle for this case is normative, and this is a good
reason for not implementing this feature within fonts, but only in Renderers
that have better knowledge of the context of use, to see if it really needs
to display that symbol, and which symbolic glyph will be the most
appropriate.

Next message: Aiet Kolkhi: "Re: that font-letter spacing problem - MS Word issue?"
Previous message: Tom Gewecke: "OS X Georgian"
In reply to: James Kass: "RE: Phetsarat font, Lao unicode"
Next in thread: Brian Wilson: "Re: Phetsarat font, Lao unicode"
Reply: Brian Wilson: "Re: Phetsarat font, Lao unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jul 11 2007 - 09:57:57 CDT