Re: combining: half, double, triple et cetera ad infinitum

From: QSJN 4 UKR <qsjn4ukr_at_gmail.com>
Date: Mon, 6 Feb 2012 12:28:49 +0200

> 2011/11/14 Philippe Verdy <verdy_p_at_wanadoo.fr>:
>> And arguably, I have also wanted this since long, instead of the hacks
>> introduced by the so called "double" diacritics and "half" diacritics
>> that break the character identity of those diacritics and also
>> introduce encoding ambiguities.
>>
>> In fact, those things would have been encoded since long if Unicode
>> and ISO 10646 had extended their character model to cover a broader
>> range of "structured character clusters".
>>
>> Two format characters (with combining class 0 for the purpose of
>> normalizations) would have been enough for most applications:
>> - U+xxx0 BEGIN EXTENDED CLUSTER (BEC)
>> - U+xxx1 END EXTENDED CLUSTER (EEC)
>> And then you would have encoded the standard diacritics after the
>> sequence enclosed by these characters, for example cartouches (using
>> an enclosing diacritic).
>>
>> A third format control would have been used as well to specify that
>> two clusters (simple letters or letters with simple diacritics, and
>> including extended clusters) would stack vertically instead of
>> horizontally. With this third one, the basic structure would be
>> encodable really as plain-text.
>>
>> Yes this would have not worked with today's OpenType specifications,
>> but this would have been the place for extending those specifications
>> and not something blocking the encoding process. i am still convinced
>> that this should not be part of an "upper-layer standard', which is
>> not interoperable, and complicates the integration of those
>> pseudo-encoded texts.
>>
>> Once the structure is encoded as such, there is still the possibility
>> to create a linear graphical representation as a reasonnable readable
>> fallback exhibiting the structure unambiguously, even if the text
>> renderer cannot produce the 2D layout (you just need to make those
>> format controls visible by themselves with a glyph, or some other
>> meaning offered in the text renderer, including with colors or various
>> style effects).

We don't need new special characters nor new half-characters nor new
ccc as I proposed above. No!
We already have the Annotation Characters!
It is possible to use something like U+FFF9 ANNOTATION ANCHOR РКГ
U+FFFA ANNOTATION SEPARATOR U+0483 COMBINING TITLO U+FFFB ANNOTATION
TERMINATOR for Cyrrilic number 123 (РКГ under titlo). This way also
titlos wit supralinear leters (like SLOVO TITLO, TVERDO TITLO, see
http://ru.wikipedia.org/wiki/%d0%a2%d0%b8%d1%82%d0%bb%d0¾) are implementable.
The only question is right processing of annotation chunkes that start
with nonstarter. I mean a being a combining character, without a base
character, chunk of multiline annotation should use previous chunk as
base (in best application).
Received on Mon Feb 06 2012 - 04:35:52 CST

This archive was generated by hypermail 2.2.0 : Mon Feb 06 2012 - 04:35:55 CST