Counting characters or bytes in UTF-8?

From: Lars Marius Garshol (larsga@garshol.priv.no)
Date: Mon Sep 11 2000 - 04:36:42 EDT


I'm working on a C/C++ application that runs on many different
platforms and supports Unicode, mostly using the old C string library
functions. This application can be compiled to either support Unicode
internally using UTF-16 or to not support Unicode at all. However, for
some platforms it seems that we may want to compile it to use UTF-8
internally.

We have a uni_strncpy function name that is mapped to some function
that performs the same task as the standard strncpy function and the
name is mapped differently depending on platform and internal text
encoding.

The question is what the 'n' argument counts. In 16-bit mode it is
obviously characters and in non-Unicode mode there is no distinction
between bytes and characters. However, what do we count with UTF-8?
My intuition tells me that it will be bytes, since the function will
not be aware that it is processing UTF-8 at all.

Can someone confirm or deny this?

--Lars M.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT