From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Aug 27 2007 - 21:54:53 CDT
It seems that regular spaces are sometimes seen as word separators in
Burmese, simply because they are easier to enter and edit correctly (as they
are visible).
Text editors do not always replace automatically the regular space between
two Burmese letters by ZWSP, but this may be automated. Or it may even not
be replaced at all when saving the text, leaving that transformation left
for the time where the text will be used, in which case, the spaces may be
even stripped out completely for rendering...
Well, I suppose the fonts supporting Burmese correctly are so much rare,
that they will certainly contain an explicit mapping for ZWSP, so that
Burmese texts canbe stored directly with ZWSP (using automatic replacement
of SPACE between Burmese letters when saving, and automatic replacement of
ZWSP between Burmese letters by regular SPACE when loading before editing
again).
A Burmese Word processor could use two modes :
* one that eases editing, where ZWSP are made visible like if they were
SPACE instances despite the SPACE bar is allowed and internally inserts a
SPACE that is replaced automatically and internally by a ZWSP between two
Burmese letters. The internal backing buffer will then contain only ZWSP.
* one for the WYZIWYG mode (or "Print Preview" mode), where the same ZWSP
are invisible (but the SPACE bar still works the same way, only the
rendering is different)
* in both modes, the spell checker may automatically signal to the user that
there are some positions in the backing buffer that still contains regular
SPACE (still visible in both modes), and a way to force the input of a
regular SPACE using key sequence like <Ctrl+SPACE bar> even if this is
incorrect (disabling the automatic substitution of the SPACE it generates on
input, and marking this SPACE instance as explicitly desired, using some
out-of-band style instruction if the document is saved in a rich-text
format, this information being lost if the document is saved in plain-text
format only and loaded again where the spellchecker will signal these
spaces).
A plain-text only editor will just use the visible mode on screen, but will
try, when printing, to remove these regular SPACES between Burmese letters
when these's no line-breaking, or replace them by newline markers (this
replacement will not affect the edit backbuffer, only what is sent to the
print processor when preparing the plain-text document for printing).
> -----Message d'origine-----
> De : unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] De la
> part de Doug Ewell
> Envoyé : dimanche 26 août 2007 01:56
> À : Unicode Mailing List
> Cc : Ngwe Tun
> Objet : Re: issues storing ZWSP in docs, files and databases
>
> Ngwe Tun wrote:
>
> > We have to use ZWSP for the word breaking in our language. So, We need
> > to use ZWSP for line breaking purpose too. Every Burmese word might
> > follow ZWSP when automatically adding or operator.
> >
> > Please let me have last clarification. Do We need to store ZWSP in
> > documents, files and database for the purpose of word
> > segmentation/breaking? Or Is it possible to add automatically in
> > others way?
>
> Burmese text will either have ZWSP between words, which means electronic
> processes can automatically determine word boundaries, or it will not,
> which means they cannot. Unicode does not tell you that you must use
> ZWSP in Burmese text, only that "if word boundary indications are
> desired" then ZWSP is the right character for the job.
>
> A program could probably be written to add ZWSP to existing Burmese
> text. Such a program would almost certainly be dictionary-based and
> would need to allow a human to review the text and fix any possible
> erorrs or ambiguities.
>
> --
> Doug Ewell · Fullerton, California, USA · RFC 4645 · UTN #14
> http://users.adelphia.net/~dewell/
> http://www1.ietf.org/html.charters/ltru-charter.html
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
>
>
This archive was generated by hypermail 2.1.5 : Mon Aug 27 2007 - 21:57:03 CDT