RE: Text Editors and Canonical Equivalence (was Coloured diacriti cs)

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Tue Dec 09 2003 - 13:22:56 EST

  • Next message: Gupta, Rohit4: "Unsubscribe"

    Peter Kirk wrote:
    > > So, should n equal four or five? The answer would appear to
    > > depend on whether or not the source file was saved in NFC
    > > or NFD format.
    > >
    > No, surely not. If the wcslen() function is fully Unicode
    > conformant, it should give the same output whatever the
    > canonically equivalent form of its input.
    > That more or less implies that it should normalise
    > its input.

    Standards and fantasy are both good things, provided you don't mix them up.

    The "wcslen" has nothing whatsoever to do with the Unicode standard, but it
    has all to do with the *C* standard. And, according to the C standard,
    "wcslen" must simply count the number "wchar_t" array elements from the
    location pointed to by its argument up to the first "wchar_t" element whose
    value is L'\0'. Full stop.

    > (One can imagine a second parameter specifying whether NFC or NFD is
    > required.)

    One can imagine whatever (s)he wants, but should please avoid to claim that
    his/her imagination corresponds to some existing standards.

    > This makes the issue one not for the text editor
    > but for the programming language or its string handling library.

    This is correct.

    > The Unicode standard does allow for special display modes in
    > which the exact underlying string, including control
    > characters, is made visible.

    Can you please cite the passage where the Unicode standard would not allow
    this?

    _ Marco



    This archive was generated by hypermail 2.1.5 : Tue Dec 09 2003 - 14:06:51 EST