RE: Case mapping of dotless lowercase letters

From: Arcane Jill (arcanejill@ramonsky.com)
Date: Wed Dec 17 2003 - 08:30:55 EST

Next message: Michael Everson: "RE: [OT] CJK -> CJC (Re: Corea?)"

Previous message: Michael Everson: "RE: [OT] CJK -> CJC (Re: Corea?)"
Maybe in reply to: Philippe Verdy: "Case mapping of dotless lowercase letters"
Next in thread: Christopher John Fynn: "Re: Case mapping of dotless lowercase letters"
Reply: Christopher John Fynn: "Re: Case mapping of dotless lowercase letters"
Reply: Peter Kirk: "Re: Case mapping of dotless lowercase letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Far be it from me to stir things up even further, but...

QUESTION - Is the rendering of {U+0065} {U+0302} (that's <i, combining
circumflex above>) locale-dependent?

I may have got this totally wrong, but it occurs to me that in
non-Turkic fonts, U+0065 is "soft-dotted". That is, the dot disappears
in the presence of any COMBINING....ABOVE modifier. But in Turkic,
U+0065 is "hard-dotted", so the dot must not be removed if a circumflex
is added. I freely admit I don't know whether Turkic uses circumflex or
not, but the question will work just as well with /any/
COMBINING....ABOVE modifier.

If this is so, how can a character be considered "soft-dotted" in one
locale and "hard-dotted" in another?

Would it not make more sense to have not two, but /three/ different
kinds of lowercase i: <non-dotted i>, <soft-dotted i> and <hard-dotted
i>?. (And similarly for uppercase). Of course, then you might as well
invent COMBINING SOFT DOT ABOVE so we can use it elsewhere.

It gets better. (You're gonna hate me). If we then make the set {
soft-dotted-i, soft-dotted-I, non-dotted-i, non-dotted-I } a casefold
equivalence class which lowercases to <soft-dotted-i> (except in the
Turkic locale, where it lowercases to non-dotted-i), and uppercases to
<non-dotted-I> in all locales; and if we similarly make { hard-dotted-i,
hard-dotted-I } a separate casefold equivalence class lowercasing to
<hard-dotted-i> and uppercasing to <hard-dotted-I> (in all locales),
then all of the problems outlined by Philippe would go away. And we
could do the same with j too.

Of course - it would have one nasty side-effect. The Turks would then
have to use <hard-dotted-i> instead of <soft-dotted-i>, but since the
characters (in this new scheme) now have completely different meanings,
that's fair enough. Hey ho.

Just musing....
Jill

Next message: Michael Everson: "RE: [OT] CJK -> CJC (Re: Corea?)"
Previous message: Michael Everson: "RE: [OT] CJK -> CJC (Re: Corea?)"
Maybe in reply to: Philippe Verdy: "Case mapping of dotless lowercase letters"
Next in thread: Christopher John Fynn: "Re: Case mapping of dotless lowercase letters"
Reply: Christopher John Fynn: "Re: Case mapping of dotless lowercase letters"
Reply: Peter Kirk: "Re: Case mapping of dotless lowercase letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Dec 17 2003 - 09:21:36 EST