Re: Lenient search engine

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Sun Jun 10 2001 - 11:17:54 EDT


From: "てんどう瘢雹りゅう瘢雹じ" <11@onna.com>

> A search engine regards the words "stone" and "STONE" as identical.
> So why isn't いし treated the same as イシ? The difference can be
> quite marked, such as レイプ versus れいぷ or such.

Well, there is nothing to stop them from doing this -- some database engines
allow you to do collation in a "kana-insensitive" way. But given that there
is so little overlap between when each is used, some people prefer to keep
them separate.

>
> Something I noticed in the Unicode 3.0 book:
>
> Case. (1) Feature of certain alphabets where the letters have two
> distinct forms. These variants, which may differ markedly in shape and
> size, are called the uppercase letter (also known as capital or majuscule)
> and the lowercase letter (also known as small or minuscule).
> (2) るるるるるる

This is not really the same thing as Kana. Only differences that are
*called* case really fall into this category.

> So kana have case.

Not really. See above.

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT