Re: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts

From: John Hudson (john@tiro.ca)
Date: Thu Jan 25 2007 - 23:08:12 CST

  • Next message: Doug Ewell: "SCSU, BOCU, and ISO 2022 (was: Re: Proposing UTF-21/24)"

    Adam Twardoch wrote:

    >> Right, but deciding to use fraktur is itself a stylistic preference.
    >> I'm not sure that going through a text that one has decided to set in
    >> fraktur -- or which might possible be displayed in Fraktur -- and
    >> inserting ZWJ everywhere one wanted ligation to occure and/or ZWNJ
    >> everywhere one didn't want it is a sensible way to enable the
    >> orthographic impact of this deicision.

    > It is. It’s like typing "k" vs. "c" or "s" vs. "z" in some languages (if
    > they’re homophones).

    I habitually write e.g. organise instead of organize, but I'm aware that I am sometimes
    writing for publication in a US magazine or book, and that the orthographic conventions
    differ. So I rely on language tagging and spellchecking and, to a lesser extent, grammar
    checking to 'convert' my particular brand of UK English to US English.

    Now let's say that I have a text in typical modern German, which I decide I want to
    display in blackletter type (noting your accurate objection to use of the specific term
    fraktur). What degree of this conversion should I be able to rely on to be automated, and
    what degree will require editorial intervention *in the text*? I don't know the answer to
    that question, and I suspect it is something that could generate a good deal of debate.

    > In OpenType, it is actually possible to share Unicode codepoints between
    > different writing system. The languagesystem tagging mechanism allows to
    > specify that a certain glyph represents the codepoint U+0041 in Latin
    > script (script tag "latn") or it represents the codepoint U+0041 in
    > Blackletter script (potentially, script tag e.g. "blak"). What do you
    > think, John?

    There's a pretty big assumption in OpenType, or at least in implementations of OpenType
    Layout support, that script tags are mappable either to Unicode ranges or to discreet
    subsets of Unicode characters drawn from different ranges. A user may be able to set
    language in an application (e.g. InDesign CS2 ME) to access specific glyph variants or
    shaping behaviour, but the script is usually (always?) presumed from the characters in the
    string.

    I'm not commenting on whether this is a good idea, but so long as it is the case U+0041 is
    always going to be <latn> script.

    John Hudson

    -- 
    Tiro Typeworks        www.tiro.com
    Vancouver, BC         john@tiro.ca
    Marie Antoinette was a woman whose core values were chocolate,
    sex, love, nature and Japanese ceramics. Frankly, there are
    worse principles of government than that.  - Karen Burshtein
    


    This archive was generated by hypermail 2.1.5 : Thu Jan 25 2007 - 23:15:50 CST