Re: Punctuation character (inverted interrobang) proposed

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Oct 19 2005 - 22:51:48 CST

  • Next message: Doug Ewell: "Re: Punctuation character (inverted interrobang) proposed"

    > Denis Jacquerye wrote:
    >
    >> When I was questioning if U+0254, 025C, 0186 and U+0190 could be
    >> precomposed with acute, grave, circumflex or caron a few month ago on
    >> Unicode-Afrique. People notorious on this list replied it would simply
    >> be impossible, because of the proposal guidelines.
    >
    >>>From the proposal guidelines :
    >> Often a proposed character can be expressed as a sequence of one or
    >> more existing Unicode characters. Encoding the proposed character
    >> would be a duplicate representation, and is thus not suitable for
    >> encoding.

    In fact this would not be theorically impossible to encode them, but they
    could only be encoded as compatibility characters, and excluded from
    composition in normalized forms, due to the normalization stability rule. So
    this would really limit the usage of these characters, as all conforming
    processes would that would use those characters would also need to support
    their canonical decomposed equivalents.

    So their encoding is not necessary, not even to convince font designers to
    support them (there are now better alternatives to convince font designers
    to support these sequences without requiring these characters to be encoded
    separately: it's to list them as supported named sequences in the Unicode
    database). I see only one reason that would push Unicode (and in fact
    ISO/IEC 10646 first) to encode them (and so add compatibility characters),
    it would be that a national character encoding standard is created that
    requires handling those characters as unbreakable units with a single code
    position in this charset.

    For such national applications however, denormalization of canonical
    decomposed sequences would be needed to transcode correctly Unicode to this
    national standard, and because this would be only an intermediate state
    before generating the national code positions, this could be achieved by
    mapping internally those characters as PUAs that have an internal canonical
    decomposition not excluded from recomposition.

    But this would require a tailored normalization algorithm (to be uses only
    as an intermediate step from the transcoding from Unicode to the national
    charset) whose result could be inconsistant (for stability) with the
    standard normalization forms in some cases (this could be legitimately a
    problem only if the inconsistent sequences are all represented with an
    equivalent in the national standard, so I think that the national standard
    would avoid this inconsistance by mapping the possibly inconsistant national
    precomposed characters to distinct decomposed Unicode sequences, possibly
    involving the use of Unicode joiner controls, so that the mapping from the
    national charset to Unicode would remain inversible even after Unicode
    normalization).



    This archive was generated by hypermail 2.1.5 : Wed Oct 19 2005 - 22:54:08 CST