Re: Glyph codes (was Re: Questions on Greek characters)

From: Peter_Constable@sil.org
Date: Fri May 19 2000 - 10:44:46 EDT


This whole topic involves a lot of murky water.

Marco is right in distinguishing character codes, which are visible to the
outside world, and codes that may be used in some (hypothetical) rendering
process and are invisible to the user. But let's get away from the
hypothetical.

First of all, talking reality, the key issue is (using TrueType jargon)
whether or not the "glyph code" is in the cmap of the font. This would make
the glyph accessible using that code as a *character* code, in which case
assigning glyphs to unassigned Unicode values is an Extremely Bad Thing
since it is almost certain to result in non-conformant data. (Some people
will be sure to find it and start using it.)

Secondly, I'm guessing that the system that Mark and others had in mind was
the Omega version of TeX (at least, it wasn't too long ago that a thread on
this list or the OpenType list was discussing the fact that Omega - or
maybe it was somebody's particular implementation with Omega - was using
unassigned Unicode values to access presentation-form glyphs). Now, I don't
know much about the workings of Omega (I could ask one of my colleagues but
she hasn't come in yet today), so I don't know whether this is potentially
a problem. The issues (whether the system in question is Omega or anything
else) are:

(i) Can somebody encode text using that unassigned Unicode value in the
text in order to display that glyph?

(ii) If those unassigned code points are later given assignments within the
standard and people start using them to encode text, will the rendering
system have problems displaying them?

If either of these is true, then that's a serious problem.

Thridly, there's the very murky water of glyph identification standards.
The industry has already explored those depths and found very little
treasure. People (especially those that weren't involved in the initial
expedition) will probably continue to suggest for many years that there
must be treasure there. Who knows whether another expedition will ever be
launched, let alone succeed. In the mean time, there is no industry
standard for "glyph codes" or glyph identifiers of any kind. (Adobe has a
spec for creating Postscript names for glyphs, but it's not a factual
industry standard, neither does it specify an exact name for every glyph in
some defined set - it's based on an open set, and they specific names to
some and provide patterns for creating new names, including names for
presentation forms.)

As for using G+ to indicate "glyph codes", I don't particularly care for it
because it suggests that the codes are in some sense standardised. The
point is, in current technologies, glyph identifiers that are needed for
rendering are very implementation specific. For example, if you're
generating a TrueType font from Fontographer, there's no guarantee that the
glyph ID (a 16-bit value) for a given glyph will be the same from one build
to the next. (Not that they will change at random; it's just that it's
possible for the designer to make changes in the font or in the build
process that will result in different gIDs, and there's no way to specify
that a given glyph should always have a specific gID.) A font developer can
always use textual glyph names that remain constant through every build,
but there is nothing in OT, ATT or Graphite (the three rendering systems
I'm familiar with) that requires that these names, or any other
identifiers, conform to any standard. (Adobe's CoolType probably requires
Adobe-conformant Postscript names, but then CoolType isn't a
general-purpose rendering engine - Adobe has so far only shown interest in
supporting a limited number of scripts.)

Given existing rendering technologies - at least those I'm familiar with,
it simply isn't necessary to talk about specific glyph "codes" or
identifiers. If there is a system, such as Omega, that uses some predefined
set of codes, then I'd say there are some important questions as to whether
or not that's a wise thing. Even if there aren't problems with the issues
mentioned above, there's the question of how the system will be extended
for rendering additional scripts, or even extended for dealing with
different implementations of the scripts already supported. E.g. diacritic
stacking and positioning can be implemented in a font either by using only
composite glyphs, using dynamic positioning (if supported by the rendering
system), or both; also e.g. very different approaches to Nastaliq
contextualisation, ligation and positioning have been taken in existing
implementations: some using glyphs that correspond roughly to atomic
strokes, some with glyphs for contextual forms of characters (with or
without positional variants), and some with precomposed glyphs for entire
syllables. A rendering system that requires a fixed set of glyph
indentifiers is really begging for an industry-wide glyph registry, and
history raises doubts as to whether such a registry would ever succeed.

Of course, there's not really a problem if a system is only intended for
rendering certain texts involving certain scripts and uses a fixed set of
"glyph codes" to do so, provided the issues mentioned above aren't a
problem. But in that case, the only people that would ever need to discuss
those "glyph codes" are people interested in the internal workings of that
rendering system - not likely to be a particularly large group.

Peter

From: <marco.cimarosti@europe.com> AT Internet on 05/19/2000 03:27 AM

To: <unicode@unicode.org> AT Internet@Ccmail
cc: (bcc: Peter Constable/IntlAdmin/WCT)

Subject: Glyph codes (was Re: Questions on Greek characters)

Kevin Bracey wrote:
> Excuse my ignorance, but what system are we talking about that
> has "glyph codes" similar to Unicode, but not identical,

I'll be frank: I was talking about a rendering engine library whose source
cose is mainly in my fantasy :-)

I don't know exactly what kind of system Mark had in mind, but I assume
that we both were just speculating.

> and at what point are these codes visible to the
> user/programmer?
> I'm not familiar with the technology involved.
> If these codes are visible externally, and could be mistaken
> for Unicode code points, it strikes me as the top of a
> slippery slope.

Users and application programmers should never ever need to be aware of the
inte rnals of any rendering engine.

But font designers probably should: the set of glyph codes is precisely the
inte rface between the font and the rendering software.

Of course the environment assumed behind such a design is poor old fonts (
la B DF). With newer technologies like OT or ATSUI all this (including the
very idea of an independent Unicode rendering engine) don't make much
sense, because it wo uld mean duplicating or violating parts of the
architecture.

> On our systems, the glyph ordering within fonts bears very
> little relation to any encoding -

This is OK. I was just stating that glyph indices *may* be inspired to
Unicode v alues, not that they *must* be.

> indeed Unicode ordering would be hugely inefficient due to the
> huge gaps in the table.

It's hard to evaluate the generality of this statement. In my fantasy, my
code i s always infinitely fast :-) In a real implementation, there are
many details th at contribute in different way to efficiency.

_ Marco
______________________________________________
FREE Personalized Email at Mail.com
Sign up at http://www.mail.com/?sr=signup



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT