From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Jul 23 2003 - 23:19:09 EDT
On Thursday, July 24, 2003 1:24 AM, John Hudson <tiro@tiro.com> wrote:
> At 03:49 PM 7/23/2003, Peter Kirk wrote:
>
> > (Yerushala(y)im with CGJ) with different versions of Uniscribe (on
> > Windows 2000). In each case CGJ is rendered as a square box in each
> > of several fonts. This behaviour indicates that actually Uniscribe
> > treats CGJ as a regular paintable character, but it is not
> > implemented in the specific fonts. So, it seems that if the font
> > designer makes the very simple changes which John Hudson mentioned,
> > "ligating" CGJ with the preceding character, the CGJ solution to
> > the Hebrew problem can be implemented very simply, with no changes
> > to rendering software and simple changes to fonts.
> >
> > So where is the serious problem with this solution? I don't see
> > one. Nor do the President and the Technical Director of the Unicode
> > Consortium. Perhaps the only problem was a misunderstanding of the
> > properties of CGJ, which I hope has now been resolved.
>
> That would be nice indeed. I'm going to test this, but will need to
> add CGJ to my font first. I'll report back in a few days.
>
> As Peter Constable noted, though, we need to be sure that the use of
> CGJ in this context is clearly defined and, most importantly, is not
> going to conflict with other possible uses. Uniscribe may, in fact,
> handle the character in a way that works now, but if so we need to
> confirm that this is intentional and is not going to change.
There's an interesting case with the <Greek Dialytika and Tonos>
precomposed combining character <U+0344>. Its canonical
decomposition is <U+0308, U+0301>, and it is excluded from
canonical recomposition (so it is really a *compatibility character*
that should not be present in any normalized form).
However, its canonical decomposition into <COMBINING DIERESIS,
COMBINING ACUTE ACCENT> who are both of combining class
230 (Above), has an impact in renderers: they are supposed to stack
one above the other, so the ACUTE ACCENT (oxia, tonos) should
appear *above* the DIERESIS (Dialytika). But usage in Greek (similar
cases occur with Vietnamese Latin letters with two above diacritics),
show that they do not stack up, but above diacritics are really
combined (the tonos accent is written in the middle of the two dots of
the dialitika).
So this is alredy a case where diacritics can (and should) ligate by
default, and that a CGJ may be used to remove (?) this ligature of
accents and instead use the vertical stack. If this is wrong, then
how do you combine a macron with a dieresis? Normally they are
shown one above the other, and using CGJ may make them appear
side-by-side. If CGJ is to be ignored always in renderers, I don't
understand its role for encoding sequences where the position of
diacritics is important (for example <acute, CGJ, grave> and <grave,
CGJ, accute>, which look similar to a "open circumflex" and a "open
caron", but not as a "open greater sign above" and a
"open lower sign above".
More generally, the relative placements of multiple diacritics with the
same combining class is currently not defined precisely, and I wonder
if this could cause problems with some languages.
An interesting case is COMBINING DOUBLE ACUTE ACCENT (U+30B),
which is not canonically decomposed into a pair of acute accents...
as if it was needed to remove the assumption that multiple combining
diacritics above should stack up, and this character make them appear
side by side (and even ligated a bit by horizontal kerning in most fonts).
I wonder what would be the effect of using <ACUTE, CGJ, ACUTE>
face to <ACUTE, ACUTE>
If correct placement of diacritics must be specified, could we use the
ideographic description characters to create those combining
sequences with a more descriptive composition rule? I know it seems
tricky but the current handling of Greek and Vietnamese requires some
compromises in the way some combining characters are precomposed
before being placed on a base letter according to its combining class.
So what is the current use of the CGJ character, and why was it
introduced? If a character is to be ignored in searches (as the combining
sequences should be treated equally as not recognizable by actual readers
I don't see the interest of using a CGJ. I think it has been introduced explicitly
to override the default placement of combining characters according to their
standard combining class, and so to make a visible distinction to the reader
and so not to handle it as fully ignorable. What is then this distinction?
-- Philippe. Spams non tolérés: tout message non sollicité sera rapporté à vos fournisseurs de services Internet.
This archive was generated by hypermail 2.1.5 : Wed Jul 23 2003 - 23:59:40 EDT