Thanks a lot for the explanations.
KW> There is no good reason to invent composite combining marks
KW> involving two accents together. (In fact, there are good reasons
KW> *not* to do so.) The few that exist, e.g. U+0344, cause
KW> implementation problems and are discouraged from use.
What are those problems? As long as they have canonical
decompositions, won't such precomposed characters be discared at
normalisation time, hopefully during I/O?
(I'm not arguing in favour of precomposed characters; I'm just saying
that my gut instinct is that we have to deal with normalisation
anyway, and hence they don't complicate anything further; I'd be
curious to hear why you think otherwise.)
>> As far as I can tell, there is nothing in the Unicode database that
>> relates a ``modifier letter'' to the associated punctuation mark.
KW> Correct. They are viewed as distinct classes.
>> does anyone [have] a map from mathematical characters to the
>> Geometric Shapes, Misc. symbols and Dingbats that would be useful
>> for rendering?
KW> As opposed to the characters themselves? I'm not sure what you
KW> are getting at here.
The user invokes a search for ``f o g'' (the composite of g with f),
and she entered U+25CB WHITE CIRCLE. The document does contain the
required formula, but encoded with U+2218 RING OPERATOR. The user's
input was arguably incorrect, but I hope you'll agree that the search
should match.
I'm rendering a document that contains U+2218. The current font
doesn't contain a glyph associated to this codepoint, but it has a
perfectly good glyph for U+25CB. The rendering software should
silently use the latter.
Analogous examples can be made for the ``modifier letters''.
I'll mention that I do understand why these are encoded separately[1],
and I do understand why and how they will behave differently in a
number of situations. I am merely noting that there are applications
(useful-in-practice search, rendering) where they may be identified or
at least related, and I am wondering whether people have already
compiled the data necessary to do so.
Thanks again,
Juliusz
[1] Offtopic: I have mixed feelings on the inclusion of STICS. On the
one hand it's great to at last have a standardised encoding for math
characters, on the other I feel it is based on very different encoding
principles than the rest of Unicode.
This archive was generated by hypermail 2.1.2 : Wed Feb 06 2002 - 10:48:17 EST