From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Apr 15 2005 - 11:56:39 CST
From: <Lorna_Priest@sil.org>
>> The problem is that I am not sure that this is a normal acute accent. May
> be
>> this is a double-wide acute accent (sorry for the name but there's also a
>
>> "double acute" accent, where double means "repeated twice side-by-side")
>> which may be encoded separately, with the combining class 234, and for
> which
>> no CGJ would be needed (additionaly, it would be possible to put this
> accent
>> above two letters without the double-wide inverted breve.
>
> However, such a thing (double-wide acute accent) does not exist in
> Unicode,
> does it?
No it doesn't. I never said it existed, because my sentence clearly says it
would need to be encoded separately with the combining class 134 used by
other "double-wide" accents.
Sorry, but I really don't like the term "double" applied to diacritics that
cover two sub-graphemes. My opinion is that they should have not been
encoded, but rather encoded using the standard diacritics above a zero-width
linking base character similar to ZWJ, used to combine several grapheme
clusters into a single default grapheme cluster, something that could have
been named "grapheme joiner", like this for example:
- to create a combined grapheme of letters a and e, without ligaturing them,
encode:
<a>, <GJ>, <e>
- one can create longer combined graphemes if needed, for example to place a
inverted breve above all of them:
<a>, <GJ>, <y>, <GJ, combining breve above>, <e>
which creates a combined grapheme for the three letters <a,y,e> and places a
linking mark ("inverted breve") above all of them.
So to encode the example given previously, we would have coded:
<a>, <GJ, combining inverted breve above, combining acute accent>, <e>
because the normal combining accents share the same combining class 230 and
their relative order is preserved by normalization.
(In this example, there are 3 "combining sequences": 2 for the base letters,
1 for the complex diacritics, but they are creating a single default
grapheme cluster)
--- The other solution would have been to create separate invisible open/close base joining characters, so that several encapsulation levels of graphemes would have been created; these would have acted like "meta" punctations (similar to parentheses, except that they don't break the words within which they may be inserted, and so that these meta-notations can be esily filtered out by processes that want to ignore the diacritics applied to these combinations. This would have been useful to embed notations like those used in grammar books for children. This would have worked also like "interlinear annotations" (or "ruby layout" in Asian texts), by specifying explicitly in the plain-text to which sets of encoded graphemes the annotations or diacritics apply. Renderers that are unable to render those annotations or diacritics on more than a single-grapheme as it requires a 3D capable layout engine, could have been allowed to not render the annotation, or to use another way to link those annotations in the final rendered document (for example, the "ruby layout" can be substituted by a linking anchor and a note rendered in a separate paragraph, possibly with a smaller font and some indentation, or in the page footer.)
This archive was generated by hypermail 2.1.5 : Fri Apr 15 2005 - 11:57:39 CST