From: Brian Wilson (bountonw@gmail.com)
Date: Wed Jul 11 2007 - 11:01:50 CDT
I have just been to Laos last week and purchased the two latest
dictionaries. I also have seen elementary school primers. These all list
the consonants and vowels separately as Thai. In Thai, the convention is to
use a hyphen type symbol as the base character. In Lao, it is to use an x
type symbol.
I do not see the point in opening up the possibilities for expanding an
infinite number of base character possibilities. As Thai and Lao are close
cousins, I would go for over kill and allow vowels in both languages to
attach to either an "-" or an "x" base character.
Brian
On 7/11/07, Philippe Verdy <verdy_p@wanadoo.fr> wrote:
>
> James Kass wrote:
> > Envoyé: mardi 10 juillet 2007 05:16
> > À: unicode@unicode.org
> > Objet: RE: Phetsarat font, Lao unicode
> >
> >
> > Philippe Verdy wrote,
> >
> > > One problem is that fonts (at least with TrueType/OpenType) are not
> > designed
> > > to support reordering and positioning with an unbound number of base
> > > characters.
> >
> > Font engines handle reordering.
>
> Not completely, and not always. Some fonts do have to use GSUB for local
> reordering according to style rather than just the script properties.
>
> > > For example the GSUB/GPOS tables in TrueType require listing
> > > somewhere the complete list of codepoints where such reordering and
> > > positioning may be applied, ...
> >
> > A listing of glyph IDs is stored in the font. Fonts only store
> > codepoints in the "cmap" table. The listing of glyph IDs may
> > be a complete list of every glyph ID involved, or it may be
> > done using ranges in order to minimize table size.
> > > ... something that can't be performed in fonts with
> > > the current format, because they don't allow defining character
> classes
> > > in them,
> >
> > The OpenType GDEF table format requires assignment of
> > glyphs to various character classes. These classes are neither
> > user- nor developer-definable, though. Unicode also assigns
> > character classes, but only to characters. Complex script
> > fonts generally have scads of "presentation form" glyphs
> > which aren't characters in the Unicode sense.
>
> I said "somewhere". You misunderstand what I mean here. I was speaking
> about
> the possibility of creating a group of code points (even if they are
> remapped internally to glyph ids within a "cmap" table or other tables)
> and
> assigning them with a single identifier that can be used in GSUB/GPOS
> rules
> tables; without it, you'll have to create asmany rules as there are in the
> product of possible base characters in one class, and of possible
> combining
> vowel signs in another class. As there may exist lots of candidate base
> characters to which such combination will be needed, this will rapidly
> exhaust the maximum size allowed for such GSUB/GPOS tables.
> Creating GSUB/GPOS tables so that their selector can include a
> pseudo-glyph
> id mapped to a class of codepoints wouldsimplify the design a lot for
> fonts
> that need to contain lots of characters (possibly from several or many
> scripts);
>
> > > ... and assigning them pseudo-glyph IDs that can be used in GSUB
> tables.
> >
> > Pseudo-glyph ID might be a misleading phrase. A Glyph ID is
> > simply the number of the position of a glyph's data in a font.
> > The first glyph, contrary to conventional counting methods,
> > is given the glyph ID of zero. And so forth.
>
> It was not misleading. I really intended a special id that can be used to
> designate a class of glyphs (mapped from a class of characters) as if it
> was
> a single glyph id, to create a single composition rule, instead of having
> one composition rule per result of the product of the two classes. It
> would
> certainly be more useful in GPOS than in GSUB.
>
> > > ... the renderer
> > > for example could be looking for rules based on the dotted circle
> > symbol,
> > > and automatically infer the other applicable rules for other Common
> > symbols,
> >
> > Does this assume that the dotted circle is part of the encoded text?
>
> Yes, for the intended purpose of showing the diacritic isolately, with an
> arbitrary base symbol.
>
> > It normally isn't, it's inserted (to the display only) by (at least) one
> > popular font engine.
>
> Not. A renderer should not have to do this unless explicitly instructed to
> do so, or if there's no other way to display the diacritic in combination
> with a associatable base character.
>
> But even in the case of, for example, a combining cedilla occurring after
> a
> base Hebrew letter, for which it is very unlikely that a font would
> implement a composition rule, and for which the renderer will be of no
> help,
> displaying the uncomposable combining cedilla with a dotted circle is not
> the ultimate solution. Many renderers will instead attempt to use some
> default reasonable positioning, for example by centering the diacritic
> horizontally with the center of the base letter (the renderer will
> probably
> not be able to move the cedilla vertically, or fond a more appropriate
> place, given that it would depend on the exact style of each base glyph,
> which does not necessarily specify attachment points for general Latin
> diacritics)
>
> > Regardless, other symbols will most always
> > have completely different metrics. It's unlikely that a font engine
> > will calculate the different heights, advance widths, and so forth,
> > in order to approximate a correct placement of the combining
> > character glyph. It's probably equally unlikely that a font developer
> > will add a potentially infinite number of GPOS rules to a font's tables
> > in order to accomplish this with every conceivable arbitrary base
> > character glyph.
>
> For notational purpose, this is still what renderers are doing when
> positioning diacritics with a dotted circle, given that fonts themselves
> are
> not specifying such advanced positioning (or substitution for resizing)
>
> What is the difference between positioning a diacritic (like a Lao vowel)
> with a base dotted circle, and positioning the same diaciitic with another
> base symbol like a cross (or something else like a circle, square, dotted
> square, crossed hatch, horizontal stroke, or checkers grid)? I've seen
> various symbols used to denote the absence of a specific base letter in
> Latin-written texts. Why would not this exist too for Lao?
>
> Is the proposer of the x-like cross sure that this convention is not
> arbitrary and specific to some authors? What is clear is that the chosen
> symbol should not be confusable with another existing letter (that's why
> choosing a simple circle or a cross was not appropriate as the base symbol
> for denoting the position of a base Latin or Greek or Cyrilic letter).
>
> But could the Unicode convention of using a dotted circle for such
> notational use the best option for all scripts? Isn't there a script where
> a
> dotted circle character gets another semantic than just a pure symbolic
> graphical feature, so that the conventional dotted circle could become
> confusable in that script? I have not seen something in Unicode that says
> that using a dotted circle for this case is normative, and this is a good
> reason for not implementing this feature within fonts, but only in
> Renderers
> that have better knowledge of the context of use, to see if it really
> needs
> to display that symbol, and which symbolic glyph will be the most
> appropriate.
>
>
>
>
>
>
-- Brian Wilson, Director Mission College Translation Center P.O. Box 4 Muaklek, Saraburi 18180 THAILAND Tel: 66-36-344-777 ext 1221 Mobile: 66-86-921-0108 Fax: 66-36-341-629
This archive was generated by hypermail 2.1.5 : Wed Jul 11 2007 - 11:04:12 CDT