Zero Width Word Boundary

From: Atif Gulzar (atif.gulzar@gmail.com)
Date: Thu Jan 29 2009 - 22:30:06 CST

  • Next message: Doug Ewell: "Re: Zero Width Word Boundary"

    Hi,

    I have checked and could not find any Unicode character for word
    separator (zero width space as WORD separator). This character/code is
    needed for languages where space is not used as word separator. The
    available zero width characters are incapable to address this issue.
    e.g.

    U+200B Zero Width Space: This character is intended for line break
    control (In Lao language lines can be broken at syllable levels, Lao
    uses U+200B to mark syllable boundaries).

    U+200C Zero Width Non Joiner: Used to separate ligatures in cursive scripts

    U+200D Zero Width Joiner: Used in cursive scripts to generate a
    joining shape forms

    U+2060 Word Joiner: A zero width non-breaking space (where words
    should not break at linebreak)

    Algorithms can be devised for word segmentation but its a laborious
    task has to be performed every time before any language processing
    algorithm like spelling check, next word, find exact word etc. There
    should be some charters that can be inserted (once) at word boundaries
    by algorithm.

    --
    Best Regards,
    Atif Gulzar
    I ◘◘◘◘ Unicode, ɹɐzlnƃ ɟıʇɐ
    


    This archive was generated by hypermail 2.1.5 : Thu Jan 29 2009 - 22:33:54 CST