From: Peter Kirk (peterkirk@qaya.org)
Date: Thu Nov 27 2003 - 11:11:55 EST
On 27/11/2003 05:00, Philippe Verdy wrote:
> ...
>
> encircle(<DIGIT 9, DIGIT 2, DIGIT 3, DOT, DIGIT 0>)
> == <DIGIT 9, DOUBLE ENCLOSING CIRCLE,
> DIGIT 2, DOUBLE ENCLOSING CIRCLE,
> DIGIT 3, DOUBLE ENCLOSING CIRCLE,
> DOT, DOUBLE ENCLOSING CIRCLE,
> DIGIT 0>
>
>Here you don't have any ZWJ character, that's the double diacritic which
>creates explicitly the ligature between the previous and next base
>character.
>
>All these solutions are not specified in the standard. This is a pure
>convention of use of Unicode, and until there's some enhancement published
>in the Unicode character model, to clearly create ranges of characters on
>which diacritics can be applied, without the too simple ZWJ control, this
>interpretation of such encoded text will remain application-dependant.
>
>
>
This is all rather interesting speculation. There are surely a lot of
potential cases in scripts where some kind of combining mark can be
considered as applying to a sequence of an arbitrary number of
characters. For example:
Enclosing circles, squares and ellipses.
Continuous underlines and overlines.
Continuous tildes, slurs, contour tone marks etc which may apply to
several characters or whole words.
The cartouche in Egyptian hieroglyphs, which surrounds a group of
several characters.
A number of mathematical functions e.g. fraction dividers, extensions to
root signs.
Combining marks which are supposed to be centred over or under two or
more characters or even a whole word, like the Hebrew masora circle.
Now I am sure it could be argued that some of these are not plain text
and so should be dealt with by higher level markup. But maybe some of
these need to be considered as part of plain text; for example, it is at
least conceivable, and arguably true of the Egyptian cartouche, that
these marks are required for proper understanding of the plain text,
just as much so as regular letters and combining marks.
So how should they be represented? Philippe's suggestion of <c1, mark,
c2, mark, c3, mark... mark, cn> would seem to work, but could be very
inefficient. Jill's alternative <bracket1, c1, c2, c3... cn, bracket2,
mark> is more efficient for long sequences. But perhaps better would be
to have paired opening and closing marks: <mark1, c1, c2, c3... cn,
mark2> - although this requires a new pair of characters for each such case.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Thu Nov 27 2003 - 11:56:48 EST