Counting characters or bytes in UTF-8?

From: Lars Marius Garshol (larsga@garshol.priv.no)
Date: Mon Sep 11 2000 - 04:36:42 EDT

Next message: Marco.Cimarosti@icl.com: "RE: Tamil glyphs"
Previous message: Harald Alvestrand: "Re: Reply-To mess opinion [was Re: Unicode on a non-Unicode web page]"
Next in thread: Michael \(michka\) Kaplan: "Re: Counting characters or bytes in UTF-8?"
Maybe reply: Michael \(michka\) Kaplan: "Re: Counting characters or bytes in UTF-8?"
Maybe reply: addison@inter-locale.com: "Re: Counting characters or bytes in UTF-8?"
Maybe reply: Mark Davis: "Re: Counting characters or bytes in UTF-8?"
Maybe reply: Yves Arrouye: "Re: Counting characters or bytes in UTF-8?"
Maybe reply: Antoine Leca: "Re: Counting characters or bytes in UTF-8?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I'm working on a C/C++ application that runs on many different
platforms and supports Unicode, mostly using the old C string library
functions. This application can be compiled to either support Unicode
internally using UTF-16 or to not support Unicode at all. However, for
some platforms it seems that we may want to compile it to use UTF-8
internally.

We have a uni_strncpy function name that is mapped to some function
that performs the same task as the standard strncpy function and the
name is mapped differently depending on platform and internal text
encoding.

The question is what the 'n' argument counts. In 16-bit mode it is
obviously characters and in non-Unicode mode there is no distinction
between bytes and characters. However, what do we count with UTF-8?
My intuition tells me that it will be bytes, since the function will
not be aware that it is processing UTF-8 at all.

Can someone confirm or deny this?

--Lars M.

Next message: Marco.Cimarosti@icl.com: "RE: Tamil glyphs"
Previous message: Harald Alvestrand: "Re: Reply-To mess opinion [was Re: Unicode on a non-Unicode web page]"
Next in thread: Michael \(michka\) Kaplan: "Re: Counting characters or bytes in UTF-8?"
Maybe reply: Michael \(michka\) Kaplan: "Re: Counting characters or bytes in UTF-8?"
Maybe reply: addison@inter-locale.com: "Re: Counting characters or bytes in UTF-8?"
Maybe reply: Mark Davis: "Re: Counting characters or bytes in UTF-8?"
Maybe reply: Yves Arrouye: "Re: Counting characters or bytes in UTF-8?"
Maybe reply: Antoine Leca: "Re: Counting characters or bytes in UTF-8?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT