Re: A new word for the English language

From: James Kass (jameskass@worldnet.att.net)
Date: Mon Aug 05 2002 - 13:40:22 EDT


From Tex Texin's Compelling Unicode Demo in the current PDF
version, it is possible to select text for copying and then paste
it into another Unicode-aware application like Outlook Express.

Here, the text is being correctly displayed (more-or-less), even
though this system uses one default font for all these scripts.
This indicates that PDF is character rather than glyph based.
Even though it is certainly possible to use a custom font encoding
and embed that font in a PDF file, in which case copy/paste would
fail if displayed with an inappropriate font and wouldn't be Unicode
even if the appropriate font were available.

Some examples from Tex's PDF document:

<quote>
The names below are actual celebrities in their home countries. See
<snip>
Armenia Aram Khachatryan
(composer)Հայաստան Արամ Խաչատրյան
Australia Nicole Kidman (actress)Australia Nicole Kidman
Austria Johann Strauss
(The "waltz king")Österreich Johann Strauß
Belgium
(Flemish)Rene Magritte (painter)België René Magritte
Belgium
(French)Rene Magritte (painter)Belgique René Magritte
Belgium
(German)Rene Magritte (painter)Belgien René Magritte
Bhutan Gonpo Dorji (film actor)འབྲུག་ཡུལ། མགོན་པོ་རྡོ་རྗེ།
Cambodia
(Khmer)
Venerable PreahBuddhaghosachar
Chuon Nath ប្ រទេស កម្ ពុ ជា ព្ រះ ពុ ទ្ ឋឃោសាចារ ្ យ ជួ ន ណាត
Canada Celine Dion (singer)Canada Céline Dion
Canada -
Nunavut
(Inuktitut
language)
Susan Aglukark (singer)ᓄᓇᕗᒻᒥᐅᑦ ᓱᓴᓐ ᐊᒡᓗᒃᑲᖅ
People's Rep.of
China ZHANG Ziyi (actress)中国章子怡
<snip>
</quote>

> I may be wrong, but it's my impression that others involved in font
> technologies would talk about "encoding glyphs" in the sense I used.
>

Yes, people do. But, not on the Unicode list. And, IIRC, the last
time I erred in this fashion, you were kind enough to remind me
that we encode characters, not glyphs.

We map glyphs, we encode characters. In a certain font editor, the
practice is to first draw the glyph, then map it to a Unicode
character.

Even though, technically, the cmap is "character-to-glyph mapping
table", there's nothing to prevent an application from converting
a GID to a character code. Or, in the case of one-to-many, to several
character codes.

William Overington has indicated that he has been experimenting
with the Softy font editor (his message 2002-07-08).

From Softy's help files:

<quote>
The vertical Glyph Display on the left shows a small scale outline of either all glyphs in the current font (optionally including,
for registered copies of Softy, the kerning pairs for each glyph), or the glyphs mapped to the character set for the selected
platform
...
</quote>

Best regards,

James Kass.

----- Original Message -----
From: <Peter_Constable@sil.org>
To: "Unicode List" <unicode@unicode.org>
Sent: Monday, August 05, 2002 9:12 AM
Subject: Re: A new word for the English language

>
> On 08/05/2002 10:04:14 AM "James Kass" wrote:
>
> >Seriously, if there's one thing I've learned from this list, it is
> >that one should think twice before arguing with Mr. Constable.
>
> Oh, I hope not!
>
>
> >> That's about as good for me. In practice, I'd probably say "it is almost
> >> never necessary to encode a ligature glyph."
> >
> >Come now, we don't encode glyphs, we encode characters.
>
> Case in point. My comment certainly is not beyond debate.
>
> I have often heard people refer to encoding of glyphs, even in the context
> of documents rather than fonts. For instance, when someone maps contextual
> forms, ligatures or positional variants of diacritics from the cmap and
> then uses that to encode these directly in a document, that is often
> referred to as a glyph encoding or a presentation-form encoding. (I prefer
> the latter, myself. Of course, some also refer to it as "a hack"... ) Now,
> arguably we are encoding *characters* and just treating the presentation
> forms as though they were distinct characters. On the other hand, there is
> no question that something like a PDF docuement is encoding specifically
> glyphs, so at least in some contexts it's valid to talk that way.
>
> I may be wrong, but it's my impression that others involved in font
> technologies would talk about "encoding glyphs" in the sense I used.
>
>
>
>
> - Peter
>
>
> ---------------------------------------------------------------------------
> Peter Constable
>
> Non-Roman Script Initiative, SIL International
> 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
> Tel: +1 972 708 7485
> E-mail: <peter_constable@sil.org>
>
>
>
>
>
>
>



This archive was generated by hypermail 2.1.2 : Mon Aug 05 2002 - 11:51:26 EDT