Re: Zero Width Word Boundary

From: Doug Ewell (doug@ewellic.org)
Date: Thu Jan 29 2009 - 22:59:26 CST

  • Next message: verdy_p: "Re: Urgent call for clarification of Armenian numbering rules"

    ɹɐzlnƃ ɟıʇɐ <atif dot gulzar at gmail dot com> wrote:

    > I have checked and could not find any Unicode character for word
    > separator (zero width space as WORD separator). This character/code is
    > needed for languages where space is not used as word separator. The
    > available zero width characters are incapable to address this issue.
    > e.g.
    >
    > U+200B Zero Width Space: This character is intended for line break
    > control (In Lao language lines can be broken at syllable levels, Lao
    > uses U+200B to mark syllable boundaries).
    > ...

    According to Section 11.1 on Thai in TUS 5.0 (p. 376), and Section 16.2
    on layout controls (p. 535), U+200B ZERO WIDTH SPACE is the right
    character for marking word boundaries in languages like Thai which don't
    use visible spaces between words. I don't see why this would be
    different for Lao.

    --
    Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
    http://www.ewellic.org
    http://www1.ietf.org/html.charters/ltru-charter.html
    http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ
    


    This archive was generated by hypermail 2.1.5 : Thu Jan 29 2009 - 23:02:31 CST