RE: issues storing ZWSP in docs, files and databases

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Aug 27 2007 - 21:54:53 CDT

  • Next message: Javier SOLA: "Re: issues storing ZWSP in docs, files and databases"

    It seems that regular spaces are sometimes seen as word separators in
    Burmese, simply because they are easier to enter and edit correctly (as they
    are visible).

    Text editors do not always replace automatically the regular space between
    two Burmese letters by ZWSP, but this may be automated. Or it may even not
    be replaced at all when saving the text, leaving that transformation left
    for the time where the text will be used, in which case, the spaces may be
    even stripped out completely for rendering...

    Well, I suppose the fonts supporting Burmese correctly are so much rare,
    that they will certainly contain an explicit mapping for ZWSP, so that
    Burmese texts canbe stored directly with ZWSP (using automatic replacement
    of SPACE between Burmese letters when saving, and automatic replacement of
    ZWSP between Burmese letters by regular SPACE when loading before editing
    again).

    A Burmese Word processor could use two modes :
    * one that eases editing, where ZWSP are made visible like if they were
    SPACE instances despite the SPACE bar is allowed and internally inserts a
    SPACE that is replaced automatically and internally by a ZWSP between two
    Burmese letters. The internal backing buffer will then contain only ZWSP.
    * one for the WYZIWYG mode (or "Print Preview" mode), where the same ZWSP
    are invisible (but the SPACE bar still works the same way, only the
    rendering is different)
    * in both modes, the spell checker may automatically signal to the user that
    there are some positions in the backing buffer that still contains regular
    SPACE (still visible in both modes), and a way to force the input of a
    regular SPACE using key sequence like <Ctrl+SPACE bar> even if this is
    incorrect (disabling the automatic substitution of the SPACE it generates on
    input, and marking this SPACE instance as explicitly desired, using some
    out-of-band style instruction if the document is saved in a rich-text
    format, this information being lost if the document is saved in plain-text
    format only and loaded again where the spellchecker will signal these
    spaces).

    A plain-text only editor will just use the visible mode on screen, but will
    try, when printing, to remove these regular SPACES between Burmese letters
    when these's no line-breaking, or replace them by newline markers (this
    replacement will not affect the edit backbuffer, only what is sent to the
    print processor when preparing the plain-text document for printing).

    > -----Message d'origine-----
    > De : unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] De la
    > part de Doug Ewell
    > Envoyé : dimanche 26 août 2007 01:56
    > À : Unicode Mailing List
    > Cc : Ngwe Tun
    > Objet : Re: issues storing ZWSP in docs, files and databases
    >
    > Ngwe Tun wrote:
    >
    > > We have to use ZWSP for the word breaking in our language. So, We need
    > > to use ZWSP for line breaking purpose too. Every Burmese word might
    > > follow ZWSP when automatically adding or operator.
    > >
    > > Please let me have last clarification. Do We need to store ZWSP in
    > > documents, files and database for the purpose of word
    > > segmentation/breaking? Or Is it possible to add automatically in
    > > others way?
    >
    > Burmese text will either have ZWSP between words, which means electronic
    > processes can automatically determine word boundaries, or it will not,
    > which means they cannot. Unicode does not tell you that you must use
    > ZWSP in Burmese text, only that "if word boundary indications are
    > desired" then ZWSP is the right character for the job.
    >
    > A program could probably be written to add ZWSP to existing Burmese
    > text. Such a program would almost certainly be dictionary-based and
    > would need to allow a human to review the text and fix any possible
    > erorrs or ambiguities.
    >
    > --
    > Doug Ewell · Fullerton, California, USA · RFC 4645 · UTN #14
    > http://users.adelphia.net/~dewell/
    > http://www1.ietf.org/html.charters/ltru-charter.html
    > http://www.alvestrand.no/mailman/listinfo/ietf-languages
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Aug 27 2007 - 21:57:03 CDT