Re: Case-folding dotted i

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Thu, 24 Jan 2013 12:26:35 +0100

2013/1/24 Richard Wordingham <richard.wordingham_at_ntlworld.com>:
> On Wed, 23 Jan 2013 23:46:33 +0100
> Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
>
>> For this reason Turkic
>> texts *should* encode the hard-dotted lower case i as i+dot above, and
>> not just as i alone. But when the language used in the text is clear,
>> the extra encoding of the explicit "hard" dot above is almost always
>> forgotten and for legacy reasons, most Turkic texts do not use this
>> extra dot above, but it does not mean that its presence is incorrect
>> (it will be needed in multilingual documents, or when using some
>> Medieval-style fonts that do NOT display any dot above U+0069 and
>> U+006A and that require the explicit U+0307 to render the hard dot
>> needed for Turkish).
>
> If text is going to be processed, i+dot is wrong for Turkish, as the
> Unicode casing rules for Turkish will capitalise it to I+dot+dot, which
> should display with two dots. If you're going to use an explicit dot,
> I'd have said <U+0131, U+0307> would be better, though I still think
> using an explicit dot is wrong in general.

Probably yes, the ASCII i/I should be avoided in all cases in Turkish,
prefering the dotless i/I every time, with or without the extra dot
above.
But the legacy use of the ASCII i/I is still prevalent everywhere
(notably for those that used the legacy 8-bit encodings that did NOT
have the combining dot above).

My opinion is that capitalizing the ASCII i followed by a combining
dot above should NEVER produce two dots (it is a limitation of the
current simple case mappings, even when using the Turkish rules). A
correct capitalization for Turkish should just produce a single dot,
by mapping not just characters per character but by working at the
grapheme level.
Received on Thu Jan 24 2013 - 05:31:10 CST

This archive was generated by hypermail 2.2.0 : Thu Jan 24 2013 - 05:31:11 CST