From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Tue May 13 2008 - 18:48:13 CDT
Philippe Verdy wrote on Tuesday, May 13, 2008 6:52 PM
> Russ Stygall wrote:
>> From UnicodeData.txt, 'prosgegrammeni' is equated to 'small letter iota',
>> see below.
> Note: Unicode does not "equate" characters, it defines canonical and
> compatibilty equivalence mappings and string canonicalization processes;
> canonical equivalence is based on those mappings, but it does not mean
> that
> the characters are "equal".
The relevant conformance condition, in TUS 5.0 at least, is C6: "A process
shall not assume that the interpretations of two canonical-equivalent
character sequences are distinct". Quite how a *process* assumes something
I have yet to work out. Part of the commentary for this requirement states,
"Ideally, an implementation would always interpret two canonical-equivalent
character sequences identically. There are practical cicumstances under
which implementations may reasonably distinguish them."
These circumstances appear to include:
(1) Converting back from Unicode to another character encoding. This is the
rationale for most of the compatibility characters with singleton
decompositions.
(2) Conformant case conversion - casing does not preserve canonical
equivalence.
(3) Generating character charts. Compatibility ideographs may be rendered
differently to their canonical decompositions, e.g U+FA30 and U+2F805, which
are canonically equivalent to U+4FAE.
>> 1FBE;GREEK PROSGEGRAMMENI;Ll;0;L;03B9;;;;N;;;0399;;0399
>> 03B9;GREEK SMALL LETTER IOTA;Ll;0;L;;;;;N;;;0399;;0399
>> From the Greek Extended table, see below, the following three characters
> are equated
>> to ALPHA/ETA/OMEGA plus 0345, not plus 1FBE or even 03B9!
>>
>> 1FBC;GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI;Lt;0;L;0391
> 0345;;;;N;;;;1FB3;
>> 1FCC;GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI;Lt;0;L;0397
> 0345;;;;N;;;;1FC3;
>> 1FFC;GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI;Lt;0;L;03A9
> 0345;;;;N;;;;1FF3;
>>
>> Why is 'iota subscript' (below) used as a substitute for 'iota adscript'
> in the above cases?
>> 0345;COMBINING GREEK YPOGEGRAMMENI;Mn;240;NSM;;;;;N;*;;0399;;0399
Iota adscript is an optional contextual variant of an iota subscript. It's
up to the font what style it adopts - and I'm treating
language/register-sensitive variations within a font as font differences.
> Both U+1FBE and U+03B9 are spacing characters, not combining characters,
> the
> equivalence between them considers this because U+1FBE is effectively an
> adscript, and definitely not a subscript; the letters with iota subscripts
> are different; Note that the iota adscript is not necessarily below the
> baseline, in fact in many texts it appares on the baseline as well and
> when
> capitalized it is treated like a standard iota and still becomes a capital
> iota.
> U+1FBE is then just a minor graphic variant of a regular iota letter and
> not
> even guaranteed to be different. On the opposite the combining subscript
> does not change when the text is capitalized.
U+1FBE is a broken character and is best avoided. As a spacing clone of
U+0345 it has been replaced by U+037A GREEK YPOGEGRAMMENI. Like all spacing
clones, it is a little impaired - its *compatibility* decomposition should
be <U+00A0, U+00345> rather than <U+0020, U+0345>, but no-one dare change
it.
The combining subscript may change when the text is capitalised. If you
want an adscript or subscript iota, the font should do it as a contextual
variant under the influence of the base letter. Alternatively, you may use
the Unicode default upper casing, and upper-case it to U+0399 GREEK CAPITAL
LETTER IOTA.
> What can be said is that U+1FBE (the iota adscript) is a compatibility
> character provided only for roundtrip compatibility with other encodings;
> the name may be misleading, for you but "ypogegrammeni" (the combining
> subscript iota) is NOT equivalent to "prosgegrammeni" (the non-combining
> small letter iota that normally follows another letter but may be treated
> as
> a plain letter itself).
On the contrary, all the decomposable letters containing 'PROSGEGRAMMENI'
have U+0345 COMBINING GREEK YPOGEGRAMMENI in their canonical decompositions.
For a history of the confusion, I recommend
http://www.tlg.uci.edu/~opoudjis/unicode/unicode_adscript.html (by Nick
Nicholas) and http://www.tlg.uci.edu/~opoudjis/unicode/ken_adscripts.html
(by Ken Whistler).
Richard.
This archive was generated by hypermail 2.1.5 : Tue May 13 2008 - 18:53:01 CDT