Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Dec 09 2003 - 13:29:42 EST

Next message: jcowan@reutershealth.com: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"

Previous message: Gupta, Rohit4: "Unsubscribe"
In reply to: jcowan@reutershealth.com: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Next in thread: jcowan@reutershealth.com: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: jcowan@reutershealth.com: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 09/12/2003 10:16, jcowan@reutershealth.com wrote:

>Peter Kirk scripsit:
>
>
>
>>No, surely not. If the wcslen() function is fully Unicode conformant, it
>>should give the same output whatever the canonically equivalent form of
>>its input.
>>
>>
>
>Not so. Remember, the conformance requirement is not that a process can't
>distinguish between canonically equivalent strings ...
>
Remembered. This is not a conformance requirement, just an "ideally".
See C9 and the posting I just made.

>... (otherwise a normalizer
>would be impossible; it wouldn't know whether to normalize or not!) ...
>
Not so. Normalisation is idempotent i.e. the result of normalising an
already normalised string (with the same normalisation form) is
identical to that of not normalising it. So the normaliser doesn't need
to know in advance if the string is normalised. Now it may be more
efficient to test for normalisation first; but the conformance clause
says nothing to stop you making implementation shortcuts.

>... but that
>a process can't assume that *other* processes will distinguish between
>canonically equivalent strings. Equally, it can't assume that the other
>process will fail to distinguish them, either.
>
>In an environment in which C wide characters are Unicode characters, then
>wcslen returns the number of distinct characters in the literal string.
>How many characters it contains depends on how many were placed in the
>source file by the author and what, if anything, has happened to the source
>file since.
>
>
>
This implies that wcslen is not doing what C9 says that it "ideally...
would always" do. But see the caveats in my other posting.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/

Next message: jcowan@reutershealth.com: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Previous message: Gupta, Rohit4: "Unsubscribe"
In reply to: jcowan@reutershealth.com: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Next in thread: jcowan@reutershealth.com: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: jcowan@reutershealth.com: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Dec 09 2003 - 14:15:16 EST