From: Don Osborn (dzo@bisharat.net)
Date: Thu Oct 25 2007 - 01:26:32 CDT
A quick search on Google Books of a book in Fula (Fulfulde Tales of North
Cameroon, Paul Kazuhisa Eguchi) resulted in no hits for words with extended
Latin characters - the browser and Google handled the characters as
expected, but the scanned and searchable text of the book apparently did not
register the extended characters as such.
I suspect this is a general problem going back to a lack of OCR that
recognizes extended characters, or at least the scanning of this particular
book did not recognize the characters.
Is anyone aware of an OCR system that recognizes extended Latin characters
from say Extended A&B, IPA, and Extended Additional ranges? That is for any
language (orthography) including these characters?
I've been discussing scanning of African language materials as part of books
online programs. The good news is a little of that has been started, but it
is definitely not good news if the scanning is being done (in some or all
cases) without the right OCR.
TIA for any feedback.
Don
This archive was generated by hypermail 2.1.5 : Thu Oct 25 2007 - 01:29:05 CDT