best character for apostrophe -- limitations of OCR

From: Tom Fruchterman (maverick@raf.com)
Date: Thu Jul 04 1996 - 18:33:00 EDT

Next message: Jake Morrison: "Re: Re: Re: Re: Need help on Unicode Databases."
Previous message: Mark Davis: "Re: Best 10646/Unicode chara"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Mark Leisher wrote:
>On the other hand, nobody expects OCR software to be smart enough to determine
>the appropriate code for the visually identical glyphs, but these kinds of
>programs can simply default to one consistent codepoint.

This point may not be of interest to most Unicoders but I darn well
hope OCR software can determine the code for visually identical glyphs
-- the same way you or I would, in context. An O and a 0 are for
practical purposes the same glyph. If you see one in the middle of a
page

you have no reasonable basis for deciding which character it is. OCR
programs use Markov probabilities and dictionaries to great success in
resolving what is an Oh and what is a zero. Similarly, I can imagine
an OCR program that would look for matching `' pairs to say the ' is a
left quote or to realize that in xxxx's the ' is probably an
apostrophe.

Next message: Jake Morrison: "Re: Re: Re: Re: Need help on Unicode Databases."
Previous message: Mark Davis: "Re: Best 10646/Unicode chara"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT