RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: Arcane Jill (arcanejill@ramonsky.com)
Date: Thu Dec 11 2003 - 11:41:47 EST

  • Next message: Michael \(michka\) Kaplan: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"

    I think Marco here has the definitive answer. I've thought about this a
    lot, and it seems to me that he's right.

    A /consequence/ of this appears to be that it DOESN'T MATTER whether or
    not a text editor normalises C or C++ source code, into either NFC or
    NFD. It shouldn't make the slightest bit of difference /unless/ the
    program has been very sloppily (i.e. badly) written. Whether or not, or
    how, normalised is the source code, it will still compile to an
    executable whose behavior is what the programmer wants. (Of course, one
    can _contrive_ examples where it makes a difference, but in general, the
    primary reasons for wanting to know the number of wchar_ts in a string
    is so that you can reserve the right amount of storage space, not so
    that you can control the flow of execution on that basis).

    Thus, I'm now becoming convinced that normalising Unicode plain text
    would a reasonable feature for a text editor to offer. (Even in XML
    documents, it would only affect /one character/, if I've understood this
    thread correctly).

    Jill

    > -----Original Message-----
    > From: Marco Cimarosti [mailto:marco.cimarosti@essetre.it]
    > Sent: Tuesday, December 09, 2003 6:14 PM
    > To: 'Arcane Jill'; unicode@unicode.org
    > Subject: RE: Text Editors and Canonical Equivalence (was Coloured
    > diacriti cs)
    >
    >
    > The answer is:
    >
    > int n = wcslen(L"café");
    >
    > That's why you take the burden to call the "wcslen" library
    > function rather
    > than assuming a hard-coded value such as:
    >
    > int n = 4; // the length of string "café"



    This archive was generated by hypermail 2.1.5 : Thu Dec 11 2003 - 12:35:40 EST