ZWNJ in IDN (was: Hebrew script in IDN)

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sat Nov 19 2005 - 11:17:21 CST

  • Next message: Dr.James Austin: "Re: Exemplar Characters"

    Neil Harris wrote:
    > Richard Wordingham wrote:
    >> Neil Harris wrote:
    >>
    >>> I think you might meet some opposition to including the following in
    >>> IDNs:

    >>> ZWNJ and ZWJ (unless Indic experts can make a _very_ good case for these
    >>> being used only in contexts where they cause _visible_ and _unambiguous_
    >>> rendering changes)
    \
    >> Well, that rules out about half the words in Burmese! I suppose there's
    >> the work around of replacing the virama - U+1039 U+200C ('VIRAMA' ZWNJ) -
    >> by U+1039 U+005F ( 'VIRAMA' LOW LINE) - extremely unnatural for a
    >> language that doesn't have spaces between words.

    > Well, that's a problem for IDN in its present form, because Nameprep (RFC
    > 3491) uses table B.1 of Stringprep (RFC 3454), which maps ZWNJ to nothing.

    At what point does the ZWNJ disappear? If it remains in what is entered and
    displayed by the user, but is ignored when comparing names, then there is no
    problem.

    > ZWNJ also appears to be used for a similar purpose in Bengali. See
    > http://www.unicode.org/faq/indic.html#21
    >
    > From my perspective, it would seem that ZWNJ should be usable in
    > identifiers, if, and only if, it is used in a context where it makes a
    > visible difference to the rendered output. This begs some questions:
    >
    > * what to do if the rendering engine does not support the script in
    > question?

    Probably not an issue. The Uniscribe that comes with Windows XP supports
    neither Burmese not Khmer, but I can still interpret what it produces. A
    more significant issue is the lack of font support - Uniscribe supports
    kana, but I don't have a font for the Katakana Phonetic Extensions. In this
    instance, can we be sure that font mixing will not be a problem? With my
    mix of fonts, underdotted Latin letters often come from a font with a larger
    x-height than the normal letters.

    > * how to phrase the rules for acceptable use of ZWNJ in an unambiguous way
    > that can be coded as an algorithm?

    Some cases may just have to be unsupported. What stops Unicode viramas
    spoofing one another? If you require that viramas be consistent with the
    script-specific characters on either sides, then you may allow certain
    combinations of virama, ZWNJ and consonants of a specific script.
    Devanagari virama + ZWNJ may be too unsafe to allow as distinct from plain
    virama, while Burmese virama + ZWNJ + consonant is always distinct from just
    virama + consonant.

    Richard.



    This archive was generated by hypermail 2.1.5 : Sat Nov 19 2005 - 11:19:57 CST