Re: A few questions about decomposition, equvalence and rendering

From: John Cowan (jcowan@reutershealth.com)
Date: Tue Feb 05 2002 - 08:58:41 EST


Juliusz Chroboczek wrote:

> The two that are in ASCII don't decompose. Is that because they're
> overloaded?

It's pretty much a given that a normalization form that meddles with
plain ASCII text isn't going to get used. It was I (ahem) who
spotted this discrepancy a while back, and the compatibility
decompositions of ASCII characters were quickly removed.

> A number of combining characters (e.g. U+0340, U+0341, U+0343) have
> canonical equivalents, i.e. canonical decompositions that are a single
> character. In other words, we have pairs of codepoints that are bound
> to behave in exactly the same manner under all circumstances. What's
> the deal?

The first two are deprecated. They were originally intended to deal
with the special treatment of acute and grave in Vietnamese, which
are kerned next to rather than above the circumflex accent when they
are used together. (Acute and grave are tone marks; circumflex marks
a distinct vowel.) However, this is properly a font issue, not a
character issue.

I don't know the exact story for CORONIS, but I bet it's some kind
of political issue.

 
> Unicode contains a number of precomposed spacing diacritical marks for
> Greek (e.g. U+1FC1). However, and unless I've missed something, with
> the exception of U+0385, they do not have combining (non-spacing)
> versions. What's the rationale here?

Eh? U+1FC1 *is* nonspacing. The U+1Fxx ones are the spacing
compatibility equivalents, except for this one.

 

-- 
John Cowan <jcowan@reutershealth.com>     http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_



This archive was generated by hypermail 2.1.2 : Tue Feb 05 2002 - 08:51:08 EST