From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Feb 11 2005 - 19:46:54 CST
John Burger said:
> I presume you're asking about my head-start comment, rather than the
> OCR idea. I just meant that the equivalence mappings, etc., provide a
> clue that some characters are similar in appearance to others.
>
> To my surprise the code charts do not seem to indicate that Cyrillic
> and Latin "a" are related. There is clearly an embarrassing gap in my
> understanding here - what am I missing?
The fact that cross-references in the code charts are intended
to provide help for people in finding the correct character
in instances where there might be some confusion as to which
is the correct one, just based on shape alone. (Cross-references
also serve other functions, as well.)
In the case of the Latin, Greek, and Cyrillic scripts, which
character to use for "A" in your script of interest is obvious,
and larding up the names list with cross-references to all
the similar-looking letters in these cases would just be
pointless.
Cross-references in the names lists for the code charts were
*never* intended to be a mechanism for identifying all
lookalikes or confusables in the standard.
People thrashing this topic around may be interested to know
that the UTC, which met just this week, is considering the
possibility of defining a "confusables mapping". That *would*
be something aimed at being a comprehensive mechanism for
dealing with the issue of confusable glyph shapes between
scripts (or others, for that matter) that everyone is obsessing
over regarding this spoofing issue. Or at least, if not
really dealing with the issue, defining it much more
precisely than tends to be done by people picking up the
book and looking up random instances of lookalikes.
My concern is whether a confusables mapping would be of
more help to the anti-spoofers seeking ways to check for
and disallow spoofing and confusion, or for the scammers
seeking to automate obscure spoofs.
--Ken
This archive was generated by hypermail 2.1.5 : Fri Feb 11 2005 - 19:47:48 CST