Re: visual glyph search

From: Neil Harris (neil@tonal.clara.co.uk)
Date: Fri Feb 18 2011 - 07:20:26 CST

  • Next message: Thomas Cropley: "UTF-c"

    On 18/02/11 11:35, Andrew West wrote:
    > On 18 February 2011 08:23, Chris Weber<chris@casabasecurity.com> wrote:
    >> I would normally use Babelmap instead of browsing the collation maps, but those are helpful, thank you Peter. That's right Jukka, when I saw Detexify at http://detexify.kirelabs.org/classify.html I thought how useful something like it could be for visually finding Unicode characters, identifying confusables, and maybe other uses.
    > What you want is a pan-Unicode OCR / handwriting recognition tool,
    > which would be the most awesome thing ever if it worked reasonably
    > well. It is the sort of thing that you should put in as a feature
    > request for BabelMap. It can't be that hard to add a simple drawing
    > pad, and as BabelMap can already extract bitmaps for all Unicode
    > characters that are mapped to a font, all it needs is for the software
    > to iterate through all 109,242 graphic characters looking for matches
    > for the user input glyph (about 20 minutes at 10ms a character, which
    > may be a bit of a problem) ... unfortunately I have no idea how to do
    > the last bit.
    >
    > Andrew
    >

    Fortunately, there are fast algorithms for searching within
    high-dimensional feature spaces which can work many orders of magnitude
    than brute-force linear-time search for this kind of problem.

    There's a huge literature on applying these algorithms to character
    recognition.

    Providing you only want to find a shortlist of a few dozen potential
    matches, which can then be chosen from by eye, it shouldn't be too
    difficult to code something useful, as demonstrated by how well detexify
    already works.

    Accurate OCR is a completely different matter.

    -- Neil



    This archive was generated by hypermail 2.1.5 : Fri Feb 18 2011 - 07:23:34 CST