From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Mar 13 2011 - 15:47:00 CST
2011/2/11 Doug Ewell <doug@ewellic.org>:
> QSJN 4 UKR <qsjn4ukr at gmail dot com> wrote:
>
>> There are several different applications of the letter cases. They
>> are used stylistically, for example, the using a capital or title
>> letters in the headers, grammatically, when the capital letter
>> identifies the beginning of the sentence, the proper name, any name
>> in German, and semantically, for example, in SI units or chemical
>> symbols.
>
> This is exactly why it is inappropriate to apply case-change operations
> indiscriminately to arbitrary snippets of text. This is not unique to
> SI prefixes (or units) or Unicode compatibility characters; it's not
> even really a computer problem. It would be just as inappropriate, as
> Jukka pointed out, to uppercase a symbol like "ms" which consists of
> ordinary letters, whether in Unicode or in handwriting.
>
>> To support all these cases, it would be nice to use special control
>> characters in the text, which would indicate where the change in the
>> case is admissible and where is not. Or to use for the SI, chemical
>> and mathematical notation and - for capitalization of proper names
>> (???) - those characters who have no case mapping, U+1D400 etc.
>
> Modifying all existing electronic text to include such an invisible
> control character,
Why « all » texts ? This was not in the proposal.
> and requiring all users and processes to enter it
> reliably,
Why « all » users ? Here again not in the proposal. In fact all
characters are encoded for an undefined number of users, possibly
small, but not for all users. The existence of the character would be
there for those users for whom the difference does matter.
> and modifying all keyboards to include a key for this new
> character, doesn't seem particularly likely at this time.
Why modifying « all » keyboards ? It is very likely to have keyboards
extended, possibly by users themselves or through helper tools,
without modifying any keyboard physically or even by software in their
driver.
Just consider the fact of French keyboards: they don't have the
possibility of enterning all characters that are prefered for French,
but anyway this does not preclude the possibility to install and use
such addons that allow entering all characters needed for French
(notably capital letters with accents, or guillemets). Various
possibilities have been developped and are used today, even if there's
still no standard mapping adopted universally for every French typist.
> Better to teach users to use common sense when applying text-transformation
> operations like uppercasing.
You can as well teach them how to enter the characters in the same
situation, and then the rest of the software will VERY likely support
the correct case mappings, for rendering or transforms.
>> What the hell good on the stability of the Unicode standard, if it
>> excludes the possibility of using it.
>
> Using a character encoding standard does require a modicum of knowledge
> about how plain text works.
It's definitely not a problem of stability of the standard, because
nothing needs to be changed on the existing characters. Adding a
combining character will not break the compatibility. Effectively new
software updates will be needed to support the new character, but it
is exactly the same situation as when encoding any new character or
even a complete script.
Unicode already has invisible characters such as the implicit
multiplicator or invisible function application, or invisible indice
separator, in mathematical formulas. Given the context where the
invisible combining character would be used (such as measure units),
it has a limited scope that brings it in the same technical domain of
applications where such character would be used.
Then, the new character will allow easier processing of texts (because
even if there's a case mapping applied blindly by some software that
ignores the new combining character, this new character will still
remain and will still allow a renderer to display the base letter plus
the new comb.char. with the correct expected case, even if the base
letter has been remapped to another case. No semantic will be lost.
And texts could still be canonicalized at any time to replace a
combination of CAPITAL LETTER plus INVISIBLE LOWERCASE into SMALL
LETTER plus INVISIBLE LOWERCASE.
This archive was generated by hypermail 2.1.5 : Sun Mar 13 2011 - 15:51:49 CST