Re: Implementing NFC

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sat Mar 17 2007 - 19:07:47 CST

  • Next message: Laurentiu Iancu: "Mysterious encoding"

    Doug Ewell wrote on Saturday, March 17, 2007 8:11 PM

    > Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

    >> Rally, normalization is only needed for compatibility with other
    >> processes that do not recognize the canonically equivalent forms (i.e.
    >> non Unicode-compliant processes, because all compliant processes should
    >> produce consistent results, i.e. canonically equivalent results from any
    >> canonically equivalent input

    > But that list of "other processes" includes most software products on the
    > market.

    The list also includes Unicode default capitalisation - it does not respect
    canonical equivalence. It works in NFD (with the possible exception of some
    N'ko-Greek hybrids), but can come unstuck in NFC for some unusual Greek
    combinations, such as <U+1FB3 GREEK SMALL LETTER ALPHA WITH YPOGEGRAMMENI,
    U+0359 COMBINING ASTERISK BELOW>. (Actually, you could argue that when this
    is capitalised, both the resulting capital alpha and the capital iota should
    have combining asterisks below them.)

    Eric Muller wrote on Saturday, March 17, 2007 4:20 PM

    > Consider writing a text editor and consider the Windows Vietnamese
    > keyboard. ... If you want to build your editor so that <any key, delete>
    > is a no-op, then you need to compensate for this mismatch, and in fact you
    > need to have a detailed knowledge of the keyboard in your editor. This
    > sound a bit much to me.

    For some keyboards, making <any key, delete> a no-op is an impossibility! I
    have in mind some Lao keyboards - typewriter-based Duang Jan and Sida Thong
    layouts both have keys for combinations of U+0EC9 LAO TONE MAI THO and
    superscript vowel, and the LaoWord phonetic keyboard includes single
    keystrokes for most of the digraphs (not just the ligatures) using U+0EAB
    LAO LETTER HO SUNG. I can't see characters being added to Unicode just to
    allow <any key, delete> to be a no-op!

    Richard.



    This archive was generated by hypermail 2.1.5 : Sat Mar 17 2007 - 19:10:22 CST