Mark-up to Indicate Words
Martin J. Dürst
duerst at it.aoyama.ac.jp
Wed Jul 15 06:18:09 CDT 2015
On 2015/07/15 16:49, Richard Wordingham wrote:
> What mark-up schemes exist to show that a sequence of letters and
> combining marks constitutes a single word?
> Such mark-up would be useful when using spell checkers. At present, I
> use U+2060 WORD JOINER (WJ) to indicate the absence of a word boundary.
> (Systematic marking of boundaries using ZWSP is not popular with
> users, and is normally not used in Thai - it's not supported in
> their national or Windows 8-bit encodings.) However, it seems likely
> that when Unicode 8.00 is defined in August, WJ will suppress line
> breaks but not word breaks. There would still be the limitation that
> mark-up is not available in plain text.
> It appears that, for example, Open Document Format has no mark-up to
> indicate word boundaries, relying instead on the overrides of
> the word boundary detection algorithms being stored at character level.
I'd suggest looking at higher-end formats such as DITA or TEI (Text
More information about the Unicode