Re: Latin w/ diacritics (was Re: benefits of unicode)

From: James Kass (jameskass@worldnet.att.net)
Date: Wed Apr 18 2001 - 13:01:48 EDT

Next message: Marco Cimarosti: "RE: Latin w/ diacritics (was Re: benefits of unicode)"
Previous message: Michael \(michka\) Kaplan: "Re: Latin w/ diacritics (was Re: benefits of unicode)"
In reply to: Marco Cimarosti: "RE: Latin w/ diacritics (was Re: benefits of unicode)"
Next in thread: Peter_Constable@sil.org: "RE: Latin w/ diacritics (was Re: benefits of unicode)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Marco Cimarosti wrote:

MC> > >
> > > I thought that the PUA was being considered here as a place
> > to put the extra
> > > *glyphs* needed internally by a rendering engine -- not as
> > a direct mean of
> > > encoding text.
> >

JK>
> > In TrueType/OpenType, glyphs don't have to be mapped (assigned to
> > code points).

MC>
> This is a myth that I hope to see eradicated as soon as possible.
>

JK>
> > Many fonts have glyphs which the user can't access
> > directly, this is especially true now with OpenType.

MC>
> Sorry, I miss the implication of this.
>
> If the user can't access them it is probably because she doesn't need
> them... The rendering engine (whether or not embedded in the font) clearly
> can access and use these glyphs when necessary.
>

Exactly, and I'm sorry for failing to be clear. In a Latin font, the
character LATIN CAPITAL LETTER A WITH ACUTE may be a compound
glyph formed internally (by the font, not an engine) by combining
the "A" glyph, which is mapped, with a "capital letter combining acute"
glyph, which need not be mapped. In a Chinese font, a character in
the font might be a compound glyph consisting of two, three, four,
or more un-mapped component glyphs. This is in TrueType. As you
note, in OpenType there are additionally many unmapped glyphs
which can be called by the rendering engine based on context.

You had wondered if the proposed PUA use wasn't just to have
a place to store the glyphs in the font which might be needed
by the rendering engine, I was pointing out that since such glyphs
don't need to be mapped, there would be no need of the PUA for
that reason. I misunderstood your point at first, the PUA encodings
would be needed in order to display the glyphs on the older OS,

>
> Unfortunately, I'll probably never find the time (or funds) to demonstrate
> it. But, at the point where I stopped years ago, I demonstrated to myself
> that at least European and Middle Eastern scripts (with most of the involved
> complexities: bidi algorithm, contextual shapes, 2-3 orders of combining
> accents) may be displayed and edited under DOS at a speed that is not
> noticeably different from any other text editor.
>
> Also consider that a dwarf 100,000 rows table (such as the Unicode's
> Database or a Unicode font) would be considered no rocket science in the
> world of, e.g., relational databases -- which do exist and do perform well
> even under DOS, Linux, older Windows, etc.
>

Displaying dot matrices on VGA screens also isn't rocket science.
The reason I mentioned converting a page to PUA rather than
using an internal display buffer is that the source page would
only need to be processed once, and then it could be operated upon
by any application on the system. Another reason for using PUA
at all is that apps already exist which can handle PUA and new apps
wouldn't have to be built. With some kind of ad hoc PUA registry for
Latin w/diacritics, only one conversion program would be needed
to cover the hundreds or thousands of languages involved.

> What does the 'cmap' table do if not converting code points to another set
> of numbers? Letting apart the fact that the second domain (the glyph
> indexes) is not standardized and not uniform across different fonts, what is
> the difference --in terms of performance-- with using pseudo-Unicode scalars
> as glyph indexes?

The 'cmap', as you say, converts code points to glyph IDs, nothing more.
But, I'm not sure what you mean by using pseudo-Unicode scalars as
glyph indexes. The first glyph in the font is glyph index zero, the
second glyph in the font is glyph index one, and so forth. The appearance
or mapping of the glyphs doesn't relate to the glyph ID.

> Thanks for the info about Pahawh Hmong!

Intriguing, isn't it?

Best regards,

James Kass.

Next message: Marco Cimarosti: "RE: Latin w/ diacritics (was Re: benefits of unicode)"
Previous message: Michael \(michka\) Kaplan: "Re: Latin w/ diacritics (was Re: benefits of unicode)"
In reply to: Marco Cimarosti: "RE: Latin w/ diacritics (was Re: benefits of unicode)"
Next in thread: Peter_Constable@sil.org: "RE: Latin w/ diacritics (was Re: benefits of unicode)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT