Re: Counting characters or bytes in UTF-8?

From: Antoine Leca (
Date: Tue Sep 12 2000 - 04:33:35 EDT

Yves Arrouye wrote:
> > 2. The original intent of strncpy() was to provide a means of copying both
> > bytes and characters. Since the assumption was 1 byte == 1 char, there was
> > no problem with this. In addition to the problem in #1, though, UTF-8
> > introduces these issues:
> I've always looked at the strxxx() functions as manipulating characters
> (strings of), and the memxxx() ones (memcpy, memcmp, actually bxxx() in my
> time) as manipulating bytes.

Unfortunately, the C Standard legislated it the other way round:
the different count values in both the memxxx() *and* the strnxxx()
functions are clearly specified as byte count, and not (multibyte)

As far as I know, all implementations with more-than-1-byte characters,
that is practically East Asian ones and the European ones for the
Videotext codesets and related T.51/T.61, take the short and easy way
and use byte counts (some invented special supplementary functions to
deal with multibyte character counts, for example dealing with "widths").


