Re: Japanese text handling problem in Unicode Collation Algorithm

From: Kent Karlsson (kent.karlsson14@comhem.se)
Date: Wed Oct 14 2009 - 04:14:33 CDT

Next message: Andrew West: "Re: Japanese text handling problem in Unicode Collation Algorithm"

Previous message: karl williamson: "Default values for Bidi_Mirroring_Glyph"
In reply to: Satoshi Nakagawa: "Re: Japanese text handling problem in Unicode Collation Algorithm"
Next in thread: Andrew West: "Re: Japanese text handling problem in Unicode Collation Algorithm"
Reply: Andrew West: "Re: Japanese text handling problem in Unicode Collation Algorithm"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Den 2009-10-13 16.48, skrev "Satoshi Nakagawa" <psychs@limechat.net>:

> My point is the difference between small kana letters and big kana
> letters is weaker than the difference between uppercase and lowercase

Do you mean "stronger"?

> in latin alphabets.
>
> You can see the fact in Google search.
>
> [あつた]
> http://www.google.com/search?q=%E3%81%82%E3%81%A4%E3%81%9F
>
> [あった]
> http://www.google.com/search?q=%E3%81%82%E3%81%A3%E3%81%9F

ok.

> These two queries show completely different results, while [konig] and
> [König] return the same results.

I don't get exactly the same results, but ö and o do get mixed up
(as does k and K).

And I think that is a major problem in web search these days.
The letters ö and o are "completely different letters", and in my
everyday usage they are as related as e and u. In my usage, ö
also collates at the end of the alphabet, not at all near o.

The case is similar for other (apparent) diacritics.

This may be fine if you don't know how to spell a certain
word. But if you do know the spelling, the current approach
gives a lot of false positives for these kinds of searches.

/kent k

Next message: Andrew West: "Re: Japanese text handling problem in Unicode Collation Algorithm"
Previous message: karl williamson: "Default values for Bidi_Mirroring_Glyph"
In reply to: Satoshi Nakagawa: "Re: Japanese text handling problem in Unicode Collation Algorithm"
Next in thread: Andrew West: "Re: Japanese text handling problem in Unicode Collation Algorithm"
Reply: Andrew West: "Re: Japanese text handling problem in Unicode Collation Algorithm"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Oct 14 2009 - 04:19:09 CDT