Re: Regular expressions in Unicode (Was: Ethiopic text)

From: Kolbjørn Aambø (k.h.aambo@ub.uio.no)
Date: Fri Mar 13 1998 - 07:59:23 EST


Would not something like:

Aa:á:Àà:â:Ãã:Ææ:Ää:Åå,Bb,Cc:Çç,Dd,Ee:Ééèêë,Ff,Gg,Hh,I:¡iíìîï,Jj,Kk,Ll,Mm,Nn:Ññ,O
o:óòô:Õõ:‘¦:Øø:Öö,Pp,Qq,Rr,Ss,Tt,Uu:úùû,Vv,Ww,Xx,Yy:Üü,Zz.

be apropriate for english searching?

Then you would find Ångstrøm by searching for Angstrom.

A little problem though: I have a problem matching
KVÆRNER by searching for KVAERNER using the above relation, any suggestion?

By the way I have seen this way of putting relation among characters in
several other peoples work.

Peter Westlake <peter@harlequin.co.uk> wrote:
:
>Now, if I want to find a word beginning with A in a list of
>scientific words used in English, then I would hope to find
>"Ångstrøm". But if I were searching for names beginning with
>A in the Danish telephone directory, it would be a mistake to
>find "Ångstrøm". So I need to say what I mean. If I want to
>match A-F in English, I need a short way of saying whether to
>include accents and case and of saying that I mean English.
>Something like [A-F::u,a,uk] where u means upper case, a means
>any accent, uk is from a standard list of codes. The range is
>interpreted in the context of the UK collating sequence. To
>omit Ångstrøms, I would ask for ^[A::u,a,dk]* meaning "a string
>beginning with a letter that matches A in Danish". In this context,
>"Danish" and "English" can be seen as equivalence relations that
>partition the character set into equivalence classes. Kolbjørn
>gave an example of such a relation.
>
:
:



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT