From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Dec 16 2003 - 10:48:58 EST
Michael Everson wrote:
> At 11:03 +0100 2003-12-16, Philippe Verdy wrote:
> >Doug Ewell <dewell@adelphia.net> writes:
> > > > Wrong here: I have found occurences of dotless lowercase i, used
> > > > instead of soft-dotted lowercase i, as base letters for diacritics
> > > > added above it (it was an accute accent...)
> > >
> > > Don't do that.
> >
> >What? This is VALID UNICODE to have texts coded like this.
>
> In Irish, it is INCORRECT to spell "físeán"
> 'video' with a DOTLESS I + COMBINING ACUTE. It is
> a spelling error, and will fail in
> spell-checking. The correct spelling is either I
> + COMBINING ACUTE or precomposed I WITH ACUTE.
Spelling was not the issue there. Only Unicode validity.
> >For whatever reason, encoded texts exist before correct fonts are used to
> >render them. So there does exist texts which use dotless lowercase i
> >before a diacritic above, simply because the author of the text did not
> >want it to be rendered with a superposed dot.
>
> Texts which contain spelling errors. Or old IPA
> texts using any number of ad-hoc IPA font
> solutions. Those texts have to be transcoded to
> proper Unicode at some stage. What you suggest is
> Not Recommended.
Not recommanded but still valid (and actually used in Turkish as well!), and
used in some occasions because of defects in fonts that don't have a
precomposed glyph for letter i with the diacritic but have a separate glyph
for the combining diacritic and for the dotted and dotless letters i, or
that use renderers unable to remove the soft dot. The IPA-93 font is such
one, which allows good typesetting, but which needs glyph processing to
select the appropriate base letter.
My main issue is, however with Turkish names found in environments where
language identification is not possible (for example a simple filename or a
locale-neutral database field or an international HTML form which requests
user names and use them as case insensitive identifiers); lowercase dotless
i do not work appropriately there.
I think it is completely illogical to match together with case-insensitive
compares, the three letters:
LATIN SMALL LETTER I (dotted)
LATIN CAPITAL LETTER I (dotless)
LATIN CAPITAL LETTER I WITH DOT ABOVE
but not:
LATIN SMALL LETTER DOTLESS I
when use locale-neutral compares, given that the normative uppercase mapping
of this fourth letter is the second letter above.
I'm sorry that nobody wants to admit it, and that this is a security issue
which causes problems when applications that expect a case-insensitive
difference means that converting the string to either lowercase or uppercase
or titlecase will preserve this difference.
__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com
This archive was generated by hypermail 2.1.5 : Tue Dec 16 2003 - 11:35:15 EST