Re: Japanese text handling problem in Unicode Collation Algorithm

From: Andrew West (andrewcwest@gmail.com)
Date: Wed Oct 14 2009 - 04:36:32 CDT

  • Next message: Colin Taylor: "Origin of the abbreviation "I18N" - the real story"

    2009/10/14 Kent Karlsson <kent.karlsson14@comhem.se>:
    >
    >> These two queries show completely different results, while [konig] and
    >> [König] return the same results.
    >
    > I don't get exactly the same results, but ö and o do get mixed up
    > (as does k and K).
    >
    > And I think that is a major problem in web search these days.
    > The letters ö and o are "completely different letters", and in my
    > everyday usage they are as related as e and u. In my usage, ö
    > also collates at the end of the alphabet, not at all near o.
    >
    > The case is similar for other (apparent) diacritics.
    >
    > This may be fine if you don't know how to spell a certain
    > word. But if you do know the spelling, the current approach
    > gives a lot of false positives for these kinds of searches.

    Googling for König returns "König", "Koenig" and "Konig", but googling
    for "König" in quotation marks returns "König" only. Likewise,
    googling for "Konig" in quotation marks does not return any hits for
    "König".

    Andrew



    This archive was generated by hypermail 2.1.5 : Wed Oct 14 2009 - 04:39:30 CDT