Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

From: Doug Ewell (dewell@adelphia.net)
Date: Sun Dec 07 2003 - 20:40:25 EST

  • Next message: Christopher John Fynn: "Re: Coloured diacritics"

    Peter Kirk <peterkirk at qaya dot org> wrote:

    > Well, this is W3C's problem. They seem to have backed themselves into
    > a corner which they need to get out of but have no easy way of doing
    > so.

    Only if this issue of applying style to individual combining marks is
    considered a sufficiently important text operation do they "need to get
    out of" this so-called corner into which they have supposedly backed.

    There are plenty of things one can do with writing that aren't supported
    by computer encodings, and aren't really expected to be. The idea of a
    black "i" with a red dot was mentioned. Here's another: the
    piece-by-piece "exploded diagrams" used to teach young children how to
    write the letters of the alphabet. For "a" you first draw a "c", then a
    half-height vertical stroke to the right of (and connected with) the
    "c". Two diagrams. For "b" you draw a full-height stroke, then a
    backwards "c" to its right. Another two diagrams. And so on.

    For each letter, each diagram except the last shows an incomplete
    letter, which might or might not accidentally correspond to the glyph
    for a real letter. Also, each diagram except the first might show the
    "new" stroke to be drawn in a different color from the strokes already
    drawn, to clarify the step-by-step process. There might also be little
    arrows with dashed lines alongside each stroke, to show the direction in
    which to draw it.

    None of these pedagogical elements can be represented in Unicode or any
    other computer encoding, or in HTML or XML markup, yet we don't conclude
    that these technologies are fatally broken because of it. These very
    specialized uses of text are, and will continue to be, handled in
    graphics.

    > Unicode is of course very familiar with this kind of situation e.g.
    > with character name errors, combining class errors, 11000+ redundant
    > Korean characters without decompositions, etc etc.

    "Without decompositions"? What about the canonical equivalence between
    jamos and syllables described in Section 3.12? What about the algorithm
    to derive the canonical decomposition shown on page 88? What am I
    missing here?

    > So no doubt it can extend its sympathy; and possibly even offer to
    > help by encoding the kind of character I was suggesting early (perhaps
    > in exchange for some W3C readiness to accept correction of errors in
    > the normalisation data?). But really this is not a Unicode issue.

    You encode a character that violates your principles and existing
    encoding models, and we'll break the promises we made to users to
    maintain normalization stability. Sounds like a great political
    compromise to me.

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Sun Dec 07 2003 - 21:21:28 EST