From: mpsuzuki@hiroshima-u.ac.jp
Date: Tue Feb 22 2011 - 11:56:32 CST
On Tue, 22 Feb 2011 17:26:16 +0100
Philippe Verdy <verdy_p@wanadoo.fr> wrote:
>Yes there's currently a sync problem with 2-byte encoded characters
>(if one byte gets deleted), but they occur in a Unicode range
>(0x80..0x407F) where they extremely rarely occur in overlong sequences
>(this range is used by scripts that also abondantly use spaces and
>ASCII punctuations, in addition to controls and line-breaks), so the
>need to resynchronize on newlines is already satisfied.
Thank you for pointing it out.
The resynchronization on newline (or on ASCII punctuation)
is needed, but I think today it is becoming insufficient
gradually. The most writing systems using the characters in
0x80..0x407F use ASCII punctuations too, but some of them
don't insert ASCII punctuations between the words (Chinese
and Japanese often use Latin-derived but non-ASCII punctuation
codepoints, and, Thai writing system inserts ASCII space
between the sentences but not between the words). Now, I
often receive a mail message that a newline only appears at
the end of a paragraph. In such writing system without
interword ASCII spaces, the sync-on-ASCII cannot prevent
the breaking a sentence. Sometimes a paragraph could be lost.
Regards,
mpsuzuki
This archive was generated by hypermail 2.1.5 : Tue Feb 22 2011 - 11:59:44 CST