From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Nov 28 2003 - 21:20:49 EST
Peter Kirk writes:
> On 28/11/2003 01:57, Andrew C. West wrote:
> >These are all specialised cases that are strictly necessary in order to
> >represent the respective scripts. General text formatting such
> as underlining or
> >arbitrary encirclement of characters (or cartouchement of
> ideographs which is
> >common in traditional Chinese texts) is considered to be "rich
> text" and beyond
> >the scope of Unicode. Whenever I read threads like this one (and
> they resurface
> >with monotonous regularity) I do wonder whether the participants
> have ever read
> >TUS Section 2.2 "Unicode Design Principles".
>
> Andrew, I agree with Jill that there is no need to get ad hominem. You
> will see that I anticipated your objection. I listed several cases where
> a combining mark might need to be associated with a group of characters,
> and suggested that some might be dealt with as "rich text". You have
> confirmed what I wrote. Some of my cases have already been encoded in
> Unicode, and in just the way I suggested; others are considered (by the
> UTC, or just by you?) as "rich text". Like Jill, I see some possible
> inconsistency. One point of this discussion is perhaps to determine if
> we ought to try to make things more consistent.
I don't think it is a matter of consistency here: the only thing that
matters is whever the absence of such grouping for diacritics can produce
semantically incorrect text, or text whose semantic is ambiguous. I agree
with Peter here that we have good examples where the simple model with a
single base character and one or more diacritics is too limited to represent
the text correctly.
For mathematics, we can still use parentheses to group items on which on
operator applies, but this is just adding to the complexity of reading of a
formula (readers must count themselves the parentheses in a plain-text file,
simply because there's no formating, and a renderer has no way to render
plain text without these parentheses). So there are cases where parentheses
are not wishable, but where invisible parentheses would allow grouping
operators correctly.
For cartouches, the solution used in music is possible, but there are other
cases like the need to use combining diacritics on more than one character
(how do you note a vector by its two points? How do you surround correctly
the hieroglyph cartouche? how do you mark in the text the coloring diacritic
or the upper thick bar that denotes it and that HAS a textual semantic?)
I think it would be simple to have invisible parentheses in that case, and
be able to apply the diacritic in the group:
<invisible open bracket><diacritics><one or more characters><invisible close
bracket>.
Then renderers have several options to display it: either effectively use a
2D layout where the diacritic is effectively drawn according to the whole
group seen as if it was a single base character, or applying the diacritic
only on a open bracket glyph like a dotted parenthesis glyph, or a dotted
square containing a parenthesis glyph (this second solution would probably
be used in 1D font-based renderers, the first one being supported only in
some cases, or by more advanced 2D layout engines).
In fact, within Unicode charts, the invisible open/close pairs should show
the dotted square glyph with the parenthese for its representative glyph.
This glyph being mirrored in BiDi environments.
It is still consistent with the current encoding of music notations. Or with
the current encoding of parentheses pairs: the only difference is that
theses character have no defined width, and is preferably invisible. It is
also consistent with the semantic notation of invisible mathematic operators
in Unicode. I don't think it's a hack (it is much less a hack than
double-width diacritics that have been encoded, or even halves of
double-width diacritics which are also encoded).
It's true that encoding them would seem to allow to encode some rich-text
styles like underlining, but I think that giving these characters the
properties of punctuations would discourage their usage to underline any
sequences of character within words (better achieved by rich-text
formatting), and that the fact that we allow the "invisible parentheses" to
be displayed if needed with a glyph would in practive avoid using them just
for underlining text; for example by applying a lower macron on a grouped
text one could think about using:
<open invisible parenthere><combining macron below>text to underline<close
invisible parenthese.
But this could be rendered in a compliant way either by underlining the
surrounded text, or by underlining unly the leading dotted parenthese glyph.
So it would not have the desired effect for underlining. However this would
be correctly interpreted as a semantic notation.
There already existing possible diacritics to create a cartouche: notably
the "combining enclosing screen" character or the "combining enclosing
keycap" or the "combining enclosing square".
__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com
This archive was generated by hypermail 2.1.5 : Fri Nov 28 2003 - 22:04:08 EST