Re: Prosgegrammeni

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Tue May 13 2008 - 18:48:13 CDT

  • Next message: Daniel Ehrenberg: "Unicode Collation Algorithm"

    Philippe Verdy wrote on Tuesday, May 13, 2008 6:52 PM
    > Russ Stygall wrote:

    >> From UnicodeData.txt, 'prosgegrammeni' is equated to 'small letter iota',
    >> see below.

    > Note: Unicode does not "equate" characters, it defines canonical and
    > compatibilty equivalence mappings and string canonicalization processes;
    > canonical equivalence is based on those mappings, but it does not mean
    > that
    > the characters are "equal".

    The relevant conformance condition, in TUS 5.0 at least, is C6: "A process
    shall not assume that the interpretations of two canonical-equivalent
    character sequences are distinct". Quite how a *process* assumes something
    I have yet to work out. Part of the commentary for this requirement states,
    "Ideally, an implementation would always interpret two canonical-equivalent
    character sequences identically. There are practical cicumstances under
    which implementations may reasonably distinguish them."

    These circumstances appear to include:

    (1) Converting back from Unicode to another character encoding. This is the
    rationale for most of the compatibility characters with singleton
    decompositions.

    (2) Conformant case conversion - casing does not preserve canonical
    equivalence.

    (3) Generating character charts. Compatibility ideographs may be rendered
    differently to their canonical decompositions, e.g U+FA30 and U+2F805, which
    are canonically equivalent to U+4FAE.

    >> 1FBE;GREEK PROSGEGRAMMENI;Ll;0;L;03B9;;;;N;;;0399;;0399
    >> 03B9;GREEK SMALL LETTER IOTA;Ll;0;L;;;;;N;;;0399;;0399

    >> From the Greek Extended table, see below, the following three characters
    > are equated
    >> to ALPHA/ETA/OMEGA plus 0345, not plus 1FBE or even 03B9!
    >>
    >> 1FBC;GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI;Lt;0;L;0391
    > 0345;;;;N;;;;1FB3;
    >> 1FCC;GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI;Lt;0;L;0397
    > 0345;;;;N;;;;1FC3;
    >> 1FFC;GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI;Lt;0;L;03A9
    > 0345;;;;N;;;;1FF3;
    >>
    >> Why is 'iota subscript' (below) used as a substitute for 'iota adscript'
    > in the above cases?
    >> 0345;COMBINING GREEK YPOGEGRAMMENI;Mn;240;NSM;;;;;N;*;;0399;;0399

    Iota adscript is an optional contextual variant of an iota subscript. It's
    up to the font what style it adopts - and I'm treating
    language/register-sensitive variations within a font as font differences.

    > Both U+1FBE and U+03B9 are spacing characters, not combining characters,
    > the
    > equivalence between them considers this because U+1FBE is effectively an
    > adscript, and definitely not a subscript; the letters with iota subscripts
    > are different; Note that the iota adscript is not necessarily below the
    > baseline, in fact in many texts it appares on the baseline as well and
    > when
    > capitalized it is treated like a standard iota and still becomes a capital
    > iota.

    > U+1FBE is then just a minor graphic variant of a regular iota letter and
    > not
    > even guaranteed to be different. On the opposite the combining subscript
    > does not change when the text is capitalized.

    U+1FBE is a broken character and is best avoided. As a spacing clone of
    U+0345 it has been replaced by U+037A GREEK YPOGEGRAMMENI. Like all spacing
    clones, it is a little impaired - its *compatibility* decomposition should
    be <U+00A0, U+00345> rather than <U+0020, U+0345>, but no-one dare change
    it.

    The combining subscript may change when the text is capitalised. If you
    want an adscript or subscript iota, the font should do it as a contextual
    variant under the influence of the base letter. Alternatively, you may use
    the Unicode default upper casing, and upper-case it to U+0399 GREEK CAPITAL
    LETTER IOTA.

    > What can be said is that U+1FBE (the iota adscript) is a compatibility
    > character provided only for roundtrip compatibility with other encodings;
    > the name may be misleading, for you but "ypogegrammeni" (the combining
    > subscript iota) is NOT equivalent to "prosgegrammeni" (the non-combining
    > small letter iota that normally follows another letter but may be treated
    > as
    > a plain letter itself).

    On the contrary, all the decomposable letters containing 'PROSGEGRAMMENI'
    have U+0345 COMBINING GREEK YPOGEGRAMMENI in their canonical decompositions.

    For a history of the confusion, I recommend
    http://www.tlg.uci.edu/~opoudjis/unicode/unicode_adscript.html (by Nick
    Nicholas) and http://www.tlg.uci.edu/~opoudjis/unicode/ken_adscripts.html
    (by Ken Whistler).

    Richard.



    This archive was generated by hypermail 2.1.5 : Tue May 13 2008 - 18:53:01 CDT