Olle> I don't agree with the Unicoders that different characters should be
Olle> used for apostrophe and the closing single quotation mark (U+02BC
Olle> MODIFIER LETTER APOSTROPHE and U+2019 RIGHT SINGLE QUOTATION
Olle> MARK). The are visually identical, so very few persons that enter
Olle> text (and no OCR programs) can be trusted to consistently choose the
Olle> correct character. The distinction between these characters is
Olle> useless in practice, and one of them should be classified as a
Olle> compatibility character; I would prefer U+02BC to be so classified.
Although visually identical, there are text processing tasks that can make use
of the distinction between the two. An English parser often has to
automatically distinguish between the use as an apostrophe or a single quote,
which is not always easy.
On the other hand, nobody expects OCR software to be smart enough to determine
the appropriate code for the visually identical glyphs, but these kinds of
programs can simply default to one consistent codepoint.
-----------------------------------------------------------------------------
mleisher@crl.nmsu.edu
Mark Leisher "The trick is not gaining the knowledge,
Computing Research Lab but surviving the lessons."
New Mexico State University -- "Svaha," Charles de Lint
Box 30001, Dept. 3CRL
Las Cruces, NM 88003
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT