From: Arcane Jill (arcanejill@ramonsky.com)
Date: Thu Nov 27 2003 - 10:14:42 EST
Hmmm.
I still like the "invisible brackets" idea. That would make the
precedence explicit. As in:
INVISIBLE_LEFT_BRACKET + "9" + "2" + INVISIBLE_RIGHT_BRACKET +
COMBINING_ENCLOSING_CIRCLE
Totally unambiguous, and would work for /all/ modifiers, not just
enclosing circle. (You could also use invisible brackets to
unambiguoulsy reorder combiners!)
Of course, it would mean the addition of two new characters to Unicode.
Would it work? Are there problems I haven't thought of?
Jill
-----Original Message-----
From: Philippe Verdy [mailto:verdy_p@wanadoo.fr]
Sent: Thursday, November 27, 2003 1:01 PM
To: Arcane Jill
Cc: Unicode@Unicode.Org
Subject: RE: numeric properties of Nl characters in the UCD
Arcane Jill writes:
> Gotcha. It's all starting to make sense now. Including the opposition
to hex.
>
> Maybe one could make "circled 92" in two stages:
> (1) create a glyph representing 92, then (2)
> apply an enclosing circle modifier to it.
>
> Except of course, that wouldn't work!
> Because a modifier only affects a single base character.
This is true if the base character is not linked with other preceding
characters by something like ZWJ which creates a ligature opportunity
(but ZWJ offers no guarantee that the ligature or junction will be
effectively applied on rendering, and does not affect the semantic of
text, as it is just a formating control).
> Basically, you'd need to do: encircle( "9" + "2")
> instead of: "9" + encircle("2")
You're right here: the simple concatenation with + is not intended to
extend the semantic of the separate encircle() transformation function.
i.e. if ZWJ was effectively creating a "semantic" ligature:
encircle(<DIGIT NINE, DIGIT TWO>)
~~ encircle(<DIGIT NINE, ZWJ, DIGIT TWO>)
~~ <DIGIT NINE, ZWJ, DIGIT TWO, COMBINING ENCLOSING CIRCLE>
or more consistently (more complicate to implement in a encircle()
function, but probably simpler to parse and render correctly by noting
that the two combining sequences on each side of ZWJ both have a common
"encircled" rendering property, which could then be "factorized" when
looking up for the range of characters to which the enclosing property
should be applied):
== <DIGIT NINE, COMBINING ENCLOSING CIRCLE, ZWJ, DIGIT TWO,
COMBINING ENCLOSING CIRCLE>
But I note that this is not the way the character model was defined.
Particularly, we have the case of "double" diacritics, currently coded
as (for example):
<base letter 1, DOUBLE TILDE, base letter 2>
and not simply as:
<base letter 1, TILDE, ZWJ, base letter 2, TILDE>
as if it was the result of the function:
tilde(<base letter 1> + <base letter 2>)
So for arbitrary encircled numbers, what would be needed is a "DOUBLE
ENCLOSING CIRCLE" diacritic (currently not encoded in Unicode, except
with PUA) like this:
encircle(<DIGIT 9, DIGIT 2>)
== <DIGIT 9, DOUBLE ENCLOSING CIRCLE, DIGIT 2>
Or for arbitrary numbers:
encircle(<DIGIT 9, DIGIT 2, DIGIT 3, DOT, DIGIT 0>)
== <DIGIT 9, DOUBLE ENCLOSING CIRCLE,
DIGIT 2, DOUBLE ENCLOSING CIRCLE,
DIGIT 3, DOUBLE ENCLOSING CIRCLE,
DOT, DOUBLE ENCLOSING CIRCLE,
DIGIT 0>
Here you don't have any ZWJ character, that's the double diacritic which
creates explicitly the ligature between the previous and next base
character.
All these solutions are not specified in the standard. This is a pure
convention of use of Unicode, and until there's some enhancement
published in the Unicode character model, to clearly create ranges of
characters on which diacritics can be applied, without the too simple
ZWJ control, this interpretation of such encoded text will remain
application-dependant.
This archive was generated by hypermail 2.1.5 : Thu Nov 27 2003 - 10:58:18 EST