RE: Phetsarat font, Lao unicode

From: Peter Constable (petercon@microsoft.com)
Date: Mon Jul 09 2007 - 20:27:09 CDT

  • Next message: Philippe Verdy: "RE: Phetsarat font, Lao unicode"

    > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On
    > Behalf Of Kent Karlsson

    > They may not "attach" (in a ligature-like fashion, like cedilla
    > attaches to a c or an s), but otherwise any base letter **should**
    > work, placementwise. It may not look ideal, but **should** look
    > roughly ok.
    >
    > If it does not, it is a flaw in the display system, not in the
    > Unicode standard.
    >
    > Likewise, reordrant vowels **should** reorder around any base
    > character. There is no need to add any characters for this, but
    > display systems may need to be updated.

    In some idealized world, maybe. But realistically, I don't see that vision being a reality.

    IMO, the behavior of e.g. Indic vowel marks is defined in relation to the particular Indic script, not arbitrary characters in general. It's simplistic to say e.g., the Oriya e vowel mark re-orders and so should re-order around any base. In a rendering implementation, other Oriya marks also need to get re-ordered relative to elements of a cluster in ways that aren't immediately obvious and aren't reflected in Unicode's code charts. That's all part of the script behavior, and it's defined in relation to the elements of that particular script, not characters in general. For example, given a cluster with a conjoining consonant plus the vowel I, <C1, virama, C2, i-kaar>, the rendering implementation will probably need to re-order that as <C1, i-kaar, virama, C2> so that the vowel mark gets positioned on the glyph that becomes the base in the visual derivation (the glyph for C1), not what Unicode would define as the base character for that combining mark (C2). Note a few points here:

    - In such scripts, the "base" for display purposes is not necessarily the same as what Unicode would characterize as the base character.
    - These relationships are necessarily defined in terms of the behavior of the script, not in terms of arbitrary characters.
    - In the above example, it would be complete nonsense to ask how the i-kaar should re-order in a sequence <"x", virama, C2, i-kaar> -- that is simply undefined.

    There are other implementation realities that make this vision of the ideal world somewhat unlikely. Software typically needs to address multiple problems in displaying multilingual text, and typically implementations differentiate between different scripts to do these things. The logic needed to render Latin vs. Arabic vs. Oriya vs. Chinese runs are not the same. The fonts needed to display these different runs most likely will not be the same. To make a generalization that a display system should be designed that some behavior of script A should work seamlessly when characters of script A interact with script B is to require a very large degree of implementation complexity for behavior that has little or no clear user scenario.

    Maybe you think a rendering implementation needs to re-order an Indic vowel mark around an Arabic base letter; but my thought is that the behavior for that kind of character interaction is simply not defined, and I don't know of any user that would need it anyway, so this is a bit of a pipe dream.

    My $.02.

    Peter



    This archive was generated by hypermail 2.1.5 : Mon Jul 09 2007 - 20:31:35 CDT