Re: UCD 3.1, Final Beta - Case folding

From: Antoine Leca (Antoine.Leca@renault.fr)
Date: Tue Mar 06 2001 - 11:24:21 EST


[utf-8]

Carl W. Brown wrote:
>
> From: Antoine Leca [mailto:Antoine.Leca@renault.fr]
>
> >Carl W. Brown wrote:
> >>
> >> The case folding is locale-less so it seems to me the it is probably
> >> better to remove the COMBINING DOT ABOVE after all 'i' / 'I'
> >> regardless of locale
> >> to make it work for Lithuanian. I doubt that this will case serious
> >> problems with caseless compares for other locales.
>
> >please consider a Turkish text, fully decomposed: there, a dot_above
> >U+0307 following an uppercase I U+0049 should certainly *not* be dropped.
>
> This works for Turkish as well. Case folding folds dotted and dotless i
> into 'i'.

This is where I do not understand.

You are saying that for some Turk, the result of the caseless comparison
will be that ı/I and i/İ will be fully intermixed.

I was understanding they expect that all the ı/I (regardless of the case)
should come before all the i/İ. Did I miss something?

Or viewed from another point, I was not sure that İstambul should match
Istambul in a _Turkish_ caseless search.

OTOH, I am neither a Turkish expert nor a i18n expert, so perhaps caseless
comparisons should ignore all accents and the like (i.e. grouping c and č,
и and й, etc. Perhaps I am overemphasing, but I hope you will get the idea)

Antoine



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:20 EDT