From: Doug Ewell (dewell@adelphia.net)
Date: Wed Dec 10 2003 - 01:13:03 EST
Peter Kirk <peterkirk at qaya dot org> wrote:
>> The "wcslen" has nothing whatsoever to do with the Unicode standard,
>> but it has all to do with the *C* standard. And, according to the C
>> standard, "wcslen" must simply count the number "wchar_t" array
>> elements from the location pointed to by its argument up to the first
>> "wchar_t" element whose value is L'\0'. Full stop.
>
> OK, as a C function handling wchar_t arrays it is not expected to
> conform to Unicode. But if it is presented as a function available to
> users for handling Unicode text, for determining how many characters
> (as defined by Unicode) are in a string, it should conform to Unicode,
> including C9.
wcslen() is very definitely presented as a function for counting
_code_units_. You can't even rely on it to count Unicode characters
accurately, if a wchar_t is 16 bits long, because supplementary
characters will require 2 code points (high + low surrogate).
Programmers rely on primitive functions like wcslen() to do what they do
very rapidly, and not to change their meaning in new versions of the
language standard. It would be very handy to have a suite of C
functions that normalize their input string to any of NFK*[CD], or to
compare strings or measure their length taking normalization into
account, but those would have to be all-new functions.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Wed Dec 10 2003 - 01:50:52 EST