Why incomplete subscript/superscript alphabet ?

Marcel Schneider charupdate at orange.fr
Wed Oct 5 08:57:48 CDT 2016

On Wed, 5 Oct 2016 14:27:44 +0900, Martin J. Dürst wrote:
> On 2016/10/04 19:35, Marcel Schneider wrote:
>> On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote:
>>> Later, the beta and gamma were encoded for phonetic notation, but not the
>>> alpha.
>>> As a result, you can write basic formulas for select compounds, but not all.
>>> Given that these basic formulae don't need full 2-D layout, this still seems
>>> like an arbitrary restriction.
>> When itʼs about informatics, arbitrary restrictions are precisely what gets me
>> upset. Those limitations are—as I wrote the other day—a useless worsening
>> of the usability and usefulness of a product.
> This kind of "let's avoid arbitrary limitations" argument works very
> well for subjects that are theoretical, straightforward, and rigid in
> nature. Many (but not all) subjects in computer science (informatics)
> are indeed of such a nature.
> The Unicode Consortium (or more specifically, the UTC) does a lot of
> hard work to create theories where appropriate, and to explain them
> where possible. But they recognize (and we should do so, too) that in
> the end, writing is a *cultural* phenomenon, where straightforward,
> rigid theories have severe limitations.
> From a certain viewpoint (the chemist's in the example above), the
> result may look arbitrary, but from another viewpoint (the
> phoneticist's), it looks perfectly fine. At first, it looks like it
> would be easy to fix such problems, but each fix risks to introduce new
> arbitrariness when seen from somebody else's viewpoint. Getting upset
> won't help. 

Iʼve got the point, thanks. Phonetics need to write running text that is 
immediately legible, while a chemistry database may use particular notational 
conventions that work with baseline letters to be parsed on semantics or light 
markup for proper display in the UI. The UTC decision thus questioned the design 
principle of using plain text for chemical formulae. No doubt it was understood 
that validating this choice would have opened the door to encoding more special 
characters for upgrading or similar purposes.

At this point Iʼd like to mention what I thought about since this thread 
was launched. The French language makes extensive use of superscripts 
to note abbreviations. This is not a mere styling issue, as it is in English. 
E.g. without superscripts, the abbreviation ‘nos’ [numbers] is ambiguated with 
the pronoun ‘nos’ [our]. The most that can be easily disambiguated is ‘n°’ [number] 
with the degree sign available on the common French keyboard layout.
For the anecdote: When a technician led me to discover the field 
‘no centre mess’ in the UI of my cellphone, it took me several seconds to understand 
‘number of SMS center/centre’ which is the actual meaning; but here, some additional 
confusion resulted from the interlanguage homograph ‘no’.

Written words being ambiguated with one another is a common phenomenon in 
natural languages. Performing disambiguation is widely achieved by adding 
vowel signs (Hebrew) or diacritics (Latin script using languages). 
French was disfavored in computer practice (applied informatics) during a 
certain time when diacritics were unavailable—on uppercase letters longer 
than on lowercase. 
AFAIK, Latin letters like ‘ij’ and ‘œ’ first gained binary existence thanks 
to the ISO 6937 charset, while a Dutch standards author asked his compatriots 
to always write ‘ij’ with two ASCII letters, and two Frenchmen prevented the ‘œ’ 
from being encoded in Latin-1 at the intended code points because of its 
non-existence in computer printers.

But today, thanks to Unicode, thatʼs all over. Therefore I suggest to grant 
the French language full support by enabling superscript lowercase letters 
in order that the SUPERSCRIPT deadkey that the French Standards body recommends, 
will work for all abreviations. There is no point about other letters than the basic 
alphabet superscripted, as no French abbreviation exceeds this range (despite of 
what I believed in 2014, like many other people). 
Additionally Iʼm proposing a modifier key combination (using a new modifier key on 
the 105th key on ISO keyboards) to access the lowercase superscripts on live keys:
Shift + Num + [letter key] ➔ [superscript lowercase].
I can easily type ‘on the 105ᵗʰ key’, and so will all users in France, at least 
with the dead key.

The missing letter is superscript q == MODIFIER LETTER SMALL Q.
Actually, when Shift + Num + Q is pressed on the projects, 
‘ ↑q_n’existe_pas’ [ superscript ‘q’ does not exist] is inserted.

Karl Pentzlin had the merit of proposing the missing letter superscript q 
for use in French abbreviations, but the UTC must have refused by arguing 
from English usage and from French recommendations. These are now changing. 
More, as I tried to demonstrate above, one cannot always rely on such 
low-profile recommendations, which express more the humility and undemandingness 
of their author, than the real practical needs and linguistical requirements.

As of searchability, Google have even the mathematical alphabets in their 
equivalence classes, so that any request written e.g. in doublestruck letters 
is read as if it were entered in plain ASCII.

Best regards,


More information about the Unicode mailing list