Re: Proofreading fonts

From: Gregg Reynolds (unicode@arabink.com)
Date: Mon Jul 11 2005 - 20:01:22 CDT

  • Next message: Gregg Reynolds: "[Fwd: Re: Proofreading fonts]"

    Asmus Freytag wrote:
    > At 03:26 PM 7/11/2005, Peter Kirk wrote:
    >
    >> In fact I think Gregg started this thread with a bad example. The two
    >> encodings for a with circumflex are canonically equivalent and so
    >> different encodings of the same data. The cases Gregg really needs to
    >> deal with are when the alternatives are not canonically equivalent but
    >> semantically distinct.
    >

    It was a great example! I just didn't make myself clear. ;) I meant
    it as a graphic design problem, not as a practical problem to be solved.
    >
    > I'm still waiting for an actual (or correctly contrived) example.
    >

    Ok, you asked for it. Here's an example taken from my own little
    speculative semantic encoding design for Arabic. Soon to be inflicted
    on an innocent world.

    The letterform waw U+0648 has at least four distinct functions in
    written Arabic.

    1. waw-rad. latin1 translit: W; phono: consonant /w/; semantics:
    radical; e.g. Wjd وجد /wajada/; shows up in the dictionary under the
    letter waw.

    2. waw-nonrad. latin-1 translit: w; phono: consonant /w/; semantics:
    non-radical; e.g. bwâdr بوادر /bawâdir/; shows up under b-d-r, the waw
    is ignored for (first-level) lexical lookup.

    3. sister of damma. latin-1 translit: û; phono: short vowel /u/;
    semantics: non-lexical (it can change meanings within a lexical
    category, though, e.g. from active to passive voice, etc); e.g. mktûb,
    مكتوب /maktoob/; like damma, does not affect lexical ordering (except
    within subentries under the root k-t-b); mnemonic: called sister of
    damma because it always comes after damma (which may not be written
    explicitly) and denotes a lengthening of the vowel /u/.

    4. lazy waw. latin-1: o; phono: null; semantics: null; e.g. bo's
    بؤس/bu's/ where ' is hamza; purely graphotactic; mnemonic: too lazy to
    bear the burden of phonological or lexical meaning; too lazy to grow the
    tail that would make it look like a real waw.

    Ok, so now we have four different encoding elements. BTW, they don't
    have to map to single codepoints. My scheme maps them to latin-1, for
    the transliteration. They could be mapped to PUA points, or to XML
    elements. In any case, they all have the same typographic denotation,
    namely waw U+0648. But you probably would have a hard time writing
    software that could automatically check spelling/encoding. So you need
    a font with four almost but not quite identical waw glyphs. I think.

    For example, lazy waw might use a small subfixed ring or null sign.

    -gregg



    This archive was generated by hypermail 2.1.5 : Mon Jul 11 2005 - 20:03:22 CDT