From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Dec 09 2003 - 13:23:08 EST
On 09/12/2003 10:01, Mark Davis wrote:
>>No, surely not. If the wcslen() function is fully Unicode conformant, it
>>should give the same output whatever the canonically equivalent form of
>>its input. That more or less implies that it should normalise its input.
>>
>>
>
>No, that is not a requirement of Unicode conformance.
>
>BTW, I must confess to an inability to keep up with the level of mail on this
>list. There are so many things in these mails that are simply wrong, and
>insufficient time for knowledgeable people to correct them. I would just caution
>people to first consult the materials on the Unicode site (Standard, TRs, FAQs,
>etc.), and take much of what is on this list with a quite sizable grain of salt.
>
>
>
Mark, I understand your problem with the level of mail. But, in this
case, I have read the appropriate section of TUS 4.0 and quote it here
to prove it, from p.59:
> C9 A process shall not assume that the interpretations of two
> canonical-equivalent character
> sequences are distinct.
> ...
> • Ideally, an implementation would always interpret two
> canonical-equivalent character
> sequences identically. ...
Perhaps my error is that I have raised (or is it lowered?) "ideally
would" to "should". So let me rephrase what I said before:
If the wcslen() function is fully Unicode conformant, ideally it would
give the same output whatever the canonically equivalent form of its input.
Surely that is what C9 is saying. Or is the issue about whether such a
function is "a process"? I didn't say that conformance implies that a
process should normalise its input (I accept that that is not true), but
only that for this particular function, counting the length of a string,
sensible results can be given only if the string is normalised, or at
least transformed in some other way which removes distinctions between
canonically equivalent forms (e.g. normalisation with some kinds of
modified data).
I am tacitly assuming at this point that the function is part of a
general-purpose library for use by users who are not interested in the
details of character coding etc. I can see that different considerations
may apply for an internal function within a Unicode processing and
rendering implementation.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Tue Dec 09 2003 - 14:03:30 EST