Why incomplete subscript/superscript alphabet ?

Frédéric Grosshans frederic.grosshans at gmail.com
Wed Oct 5 12:02:51 CDT 2016

Le 05/10/2016 à 15:57, Marcel Schneider a écrit :
> On Wed, 5 Oct 2016 14:27:44 +0900, Martin J. Dürst wrote:
>> On 2016/10/04 19:35, Marcel Schneider wrote:
>>> On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote:
>>>> Later, the beta and gamma were encoded for phonetic notation, but not the
>>>> alpha.
>>>> As a result, you can write basic formulas for select compounds, but not all.
>>>> Given that these basic formulae don't need full 2-D layout, this still seems
>>>> like an arbitrary restriction.
>>> When itʼs about informatics, arbitrary restrictions are precisely what gets me
>>> upset. Those limitations are—as I wrote the other day—a useless worsening
>>> of the usability and usefulness of a product.
>> This kind of "let's avoid arbitrary limitations" argument works very
>> well for subjects that are theoretical, straightforward, and rigid in
>> nature. Many (but not all) subjects in computer science (informatics)
>> are indeed of such a nature.
>> The Unicode Consortium (or more specifically, the UTC) does a lot of
>> hard work to create theories where appropriate, and to explain them
>> where possible. But they recognize (and we should do so, too) that in
>> the end, writing is a *cultural* phenomenon, where straightforward,
>> rigid theories have severe limitations.
>>  From a certain viewpoint (the chemist's in the example above), the
>> result may look arbitrary, but from another viewpoint (the
>> phoneticist's), it looks perfectly fine. At first, it looks like it
>> would be easy to fix such problems, but each fix risks to introduce new
>> arbitrariness when seen from somebody else's viewpoint. Getting upset
>> won't help.
> Iʼve got the point, thanks. Phonetics need to write running text that is
> immediately legible, while a chemistry database may use particular notational
> conventions that work with baseline letters to be parsed on semantics or light
> markup for proper display in the UI. The UTC decision thus questioned the design
> principle of using plain text for chemical formulae. No doubt it was understood
> that validating this choice would have opened the door to encoding more special
> characters for upgrading or similar purposes.
I think there is a big difference between adding a few characters for a 
new use (chemistry formulae) and completing an obvious almost complete 
set. People are used to see the 26 basic alphabetic Latin character 
(abcdefghijklmnopqrstuvwxyz) being treated preferentially by computers, 
but are always surprised when only one of them is treated differently. 
Initially, superscript letters where restricted to a few letter, and it 
made sense to restrict the temptation to complete the set. But now that 
all modifier small latin letters except q are encoded, it makes little 
sense. Many people use these characters (arguably wrongly) for many uses 
beyond IPA, and they are invariably surprised if they need q. The 
special status of the basic Latin alphabet means that almost no one 
would be surprised not to find a superscripted α, è, or ∞ and adding the 
last missing latin basic letter q would not open the door to any more 

> At this point Iʼd like to mention what I thought about since this thread
> was launched. The French language makes extensive use of superscripts
> to note abbreviations. [...] Therefore I suggest to grant
> the French language full support by enabling superscript lowercase letters
> in order that the SUPERSCRIPT deadkey that the French Standards body recommends,
> will work for all abreviations. There is no point about other letters than the basic
> alphabet superscripted, as no French abbreviation exceeds this range (despite of
> what I believed in 2014, like many other people).
Whether è (and í) are needed or not is another question. Even if it were 
useful (as argued ny others in this thread), it brings non trivial 
technical difficulties in terms of NFC/NFD. But since people are used to 
see these characters being treated differently, I think the “problem” of 
the lack of superscript composed character is less obvious than the lack 
of *MODIFIER LETTER SMALL Q, in the sense that the first absence is 
perceived (by the Unicode naive user) as more normal than the second.


More information about the Unicode mailing list