Re: Proofreading fonts

From: Kenneth Whistler (
Date: Mon Jul 11 2005 - 21:01:02 CDT

  • Next message: Chetan Pandey: "RE: Regarding Correct Display of Extended Latin Devanagari"


    > Ok, you asked for it. Here's an example taken from my own little
    > speculative semantic encoding design for Arabic. Soon to be inflicted
    > on an innocent world.
    > The letterform waw U+0648 has at least four distinct functions in
    > written Arabic.

    O.k., but as you surmised in an earlier note, what you are trying
    to do here is distinct from a *character* encoding of the sort
    that the Unicode Standard does.

    The Unicode encoding sees a waw in the written form, and represents
    that by a waw in the text representation, with a single waw
    character encoded. (Compatibility presentation form gorp, aside,
    of course.) It doesn't get into issues of morphological or
    phonological analysis, nor should it, in my assessment.

    What you are presenting might well be a very interesting and useful
    way to represent Arabic text, but from the Unicode point-of-view
    it is a *markup* of the plain text with more information beyond
    what is simply carried by the surface form of the letters.

    Another way to look at it is simply to correlate your Latin-1
    transliteration scheme with the plain text representation, and
    consider that the markup (however implemented):

    1. waw-rad: waw --> W

    2. waw-nonrad: waw --> w

    3. sister of damma: waw --> (Latin-1 u-circumflex, in case anyone
                                     gets character hash here)
    4. lazy waw: waw --> o

    As long as your markup scheme synchronizes the plain text element
    on the left with your Latin-1 transcriptional equivalent on the
    right, by whatever means, you have the piece of information
    then available to make the distinctions you are after.

    How that is *rendered* then is "an exercise left for the implementer".
    *hehe*. It could be simply interlinear annotation, or it could be
    popup tooltips, or it could be in the kind of hacked up font you
    all talking about that would visually diacriticalize waw's of
    different types. Or you could just color code the text, separating
    out all the radical waw's in green, and the lazy waw's in pink, or
    whatever, based on the information you have represented in the

    The important thing, from my point of view, is that this kind
    of issue and this kind of representation of text is not
    a character encoding issue per se, but rather builds on top
    of the character encoding to present a deeper analysis of the
    text that carries information not simply the result of the
    identification of the characters alone.

    In principle, this is no different than color coding all the
    "c's" in English text to indicate their different pronunciations,
    for example -- which could also be carried around by
    subcategorizing and marking them up with phonetic information,
    including c's participation in digraphs:

    c(=k)olor c(=s)ic(=k)ada

    [ch](=esh)ute [ch](=k)yle [ch](=t-esh)ime

    and so on and so on.


    This archive was generated by hypermail 2.1.5 : Mon Jul 11 2005 - 21:02:00 CDT