Mark Leisher wrote:
>On the other hand, nobody expects OCR software to be smart enough to determine
>the appropriate code for the visually identical glyphs, but these kinds of
>programs can simply default to one consistent codepoint.
This point may not be of interest to most Unicoders but I darn well
hope OCR software can determine the code for visually identical glyphs
-- the same way you or I would, in context. An O and a 0 are for
practical purposes the same glyph. If you see one in the middle of a
page
O
you have no reasonable basis for deciding which character it is. OCR
programs use Markov probabilities and dictionaries to great success in
resolving what is an Oh and what is a zero. Similarly, I can imagine
an OCR program that would look for matching `' pairs to say the ' is a
left quote or to realize that in xxxx's the ' is probably an
apostrophe.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT