Re: Japanese text handling problem in Unicode Collation Algorithm

From: Andrew West ([email protected])
Date: Wed Oct 14 2009 - 04:36:32 CDT

Next message: Colin Taylor: "Origin of the abbreviation "I18N" - the real story"

Previous message: Kent Karlsson: "Re: Japanese text handling problem in Unicode Collation Algorithm"
In reply to: Kent Karlsson: "Re: Japanese text handling problem in Unicode Collation Algorithm"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

2009/10/14 Kent Karlsson <[email protected]>:
>
>> These two queries show completely different results, while [konig] and
>> [König] return the same results.
>
> I don't get exactly the same results, but ö and o do get mixed up
> (as does k and K).
>
> And I think that is a major problem in web search these days.
> The letters ö and o are "completely different letters", and in my
> everyday usage they are as related as e and u. In my usage, ö
> also collates at the end of the alphabet, not at all near o.
>
> The case is similar for other (apparent) diacritics.
>
> This may be fine if you don't know how to spell a certain
> word. But if you do know the spelling, the current approach
> gives a lot of false positives for these kinds of searches.

Googling for König returns "König", "Koenig" and "Konig", but googling
for "König" in quotation marks returns "König" only. Likewise,
googling for "Konig" in quotation marks does not return any hits for
"König".

Andrew

Next message: Colin Taylor: "Origin of the abbreviation "I18N" - the real story"
Previous message: Kent Karlsson: "Re: Japanese text handling problem in Unicode Collation Algorithm"
In reply to: Kent Karlsson: "Re: Japanese text handling problem in Unicode Collation Algorithm"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Oct 14 2009 - 04:39:30 CDT