Mark-up to Indicate Words
richard.wordingham at ntlworld.com
Wed Jul 15 02:49:13 CDT 2015
What mark-up schemes exist to show that a sequence of letters and
combining marks constitutes a single word?
Such mark-up would be useful when using spell checkers. At present, I
use U+2060 WORD JOINER (WJ) to indicate the absence of a word boundary.
(Systematic marking of boundaries using ZWSP is not popular with
users, and is normally not used in Thai - it's not supported in
their national or Windows 8-bit encodings.) However, it seems likely
that when Unicode 8.00 is defined in August, WJ will suppress line
breaks but not word breaks. There would still be the limitation that
mark-up is not available in plain text.
It appears that, for example, Open Document Format has no mark-up to
indicate word boundaries, relying instead on the overrides of
the word boundary detection algorithms being stored at character level.
More information about the Unicode