Re: A few questions about decomposition, equvalence and rendering

From: Juliusz Chroboczek (jec@dcs.ed.ac.uk)
Date: Wed Feb 06 2002 - 11:17:26 EST

Previous message: Juliusz Chroboczek: "Re: A few questions about decomposition, equvalence and rendering"
Maybe in reply to: Juliusz Chroboczek: "A few questions about decomposition, equvalence and rendering"
Next in thread: Kenneth Whistler: "Re: A few questions about decomposition, equvalence and rendering"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Thanks a lot for the explanations.

KW> There is no good reason to invent composite combining marks
KW> involving two accents together. (In fact, there are good reasons
KW> *not* to do so.) The few that exist, e.g. U+0344, cause
KW> implementation problems and are discouraged from use.

What are those problems? As long as they have canonical
decompositions, won't such precomposed characters be discared at
normalisation time, hopefully during I/O?

(I'm not arguing in favour of precomposed characters; I'm just saying
that my gut instinct is that we have to deal with normalisation
anyway, and hence they don't complicate anything further; I'd be
curious to hear why you think otherwise.)

>> As far as I can tell, there is nothing in the Unicode database that
>> relates a ``modifier letter'' to the associated punctuation mark.

KW> Correct. They are viewed as distinct classes.

>> does anyone [have] a map from mathematical characters to the
>> Geometric Shapes, Misc. symbols and Dingbats that would be useful
>> for rendering?

KW> As opposed to the characters themselves? I'm not sure what you
KW> are getting at here.

The user invokes a search for ``f o g'' (the composite of g with f),
and she entered U+25CB WHITE CIRCLE. The document does contain the
required formula, but encoded with U+2218 RING OPERATOR. The user's
input was arguably incorrect, but I hope you'll agree that the search
should match.

I'm rendering a document that contains U+2218. The current font
doesn't contain a glyph associated to this codepoint, but it has a
perfectly good glyph for U+25CB. The rendering software should
silently use the latter.

Analogous examples can be made for the ``modifier letters''.

I'll mention that I do understand why these are encoded separately[1],
and I do understand why and how they will behave differently in a
number of situations. I am merely noting that there are applications
(useful-in-practice search, rendering) where they may be identified or
at least related, and I am wondering whether people have already
compiled the data necessary to do so.

Thanks again,

Juliusz

[1] Offtopic: I have mixed feelings on the inclusion of STICS. On the
one hand it's great to at last have a standardised encoding for math
characters, on the other I feel it is based on very different encoding
principles than the rest of Unicode.

Previous message: Juliusz Chroboczek: "Re: A few questions about decomposition, equvalence and rendering"
Maybe in reply to: Juliusz Chroboczek: "A few questions about decomposition, equvalence and rendering"
Next in thread: Kenneth Whistler: "Re: A few questions about decomposition, equvalence and rendering"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Wed Feb 06 2002 - 10:48:17 EST