On Tue, 21 Jul 2015 18:10:14 +0800
gfb hjjhjh <c933103_at_gmail.com> wrote:
> When you write text in modern Chinese, there will not be any break
> between different words, and thus if you segment characters according
> to the ideographic characters, what being groupped together would
> either be a clausee or a sentence, Or even a whole paragraph if you
> are handling some older text without punctuations.
I had another look at Chinese word breaking algorithms today and saw
that their practical purposes were mostly indexing and machine
translation. Consequently, I suspect that authors have little
incentive to mark word boundaries in the texts they originate. This
differs from the Thai situation where marking word boundaries improves
layout and spell-checking.
Richard.
Received on Tue Jul 21 2015 - 18:34:53 CDT
This archive was generated by hypermail 2.2.0 : Tue Jul 21 2015 - 18:34:53 CDT