Re: ZWNJ in IDN (Burmese Issues)

From: Ngwe Tun (ngwestar@gmail.com)
Date: Tue Nov 29 2005 - 08:28:55 CST

  • Next message: Philippe Verdy: "Re: Character delta between Unicode 4.1 and 5.0"

    Dear Javier & Group.

    On 11/28/05, Javier SOLA <lists@khmeros.info> wrote:
    >
    >
    > >>> How appropriate would ZWSP be in the middle of words like 'Myanma(r)'
    > >>> and 'Yangon'?
    > >>
    > ZWSP indicates a breaking opportunity. This would be innapropriate if
    > the word should not be broken at the end of a line,
    > as in Myan
    > mar.
    > (which is probably the case).

    It's my understanding that ZWSP mean breaking oppertunity for word. Ok, How
    can I break cluster for the myanmar word. In your example, Myan is one
    cluster and mar is one cluster. We really need cluster breaking for burmese
    language computing.

    In Last request, I founded some pitfall in Unicode Chapter 10. So, I will
    propose some additional case for ZWNJ. Because of either some cluster will
    not be ended with ZWNJ or ZWNJ placed Between Virama Sign and tone mark. So,
    We hope to add ZWNJ or ZWJ at the end of every cluster in Burmese.

    I am not an expert in Myanmar (even if I am trying to make it render in
    > ICU). I would tend to see ZWNJ and ZWJ as part of a cluster, and not as
    > word separators. A ZWNJ could be the last character of a cluster... and
    > this signals that the cluster is finished... but it is not a word
    > separator. A ZWNJ at the end of the first cluster of a two-cluster word
    > would not be a separator (if the word should not be divided).
    >
    > ZWNJ is an element used in the standard order of components; ZWSP could
    > never be.
    >
    > I would assume that two different renderings (with and without ZWNJ)
    > would lead to different IDNs. IDNs are first expanded (character by
    > character) and then compared byte-by-byte. and this would lead to not
    > matching two strings if one of them has an extra character (the ZWNJ). I
    > do not think that the BIND program used for DNS resolution can do any
    > type of normalisation... and I agree that - as it is contemplated in the
    > standard order of components - ZWNJ should be usable in IDNs.

    ZWNJ really needed for IDNs and useful for visible virama rendering. Shall
    we intend to add end of every cluster?

    In Khmer this would be more problematic, as the ZWNJ is mostly used to
    > break font ligatures (such as LETTER UO + VOWEL I in moul style fonts),
    > but the word is exactly the same.
    >
    > Javier
    >
    >
    >
    > Regards

    Ngwe Tun



    This archive was generated by hypermail 2.1.5 : Tue Nov 29 2005 - 08:35:19 CST