Re: Counting characters or bytes in UTF-8?

From: Yves Arrouye (yves@realnames.com)
Date: Tue Sep 12 2000 - 02:49:35 EDT

Next message: John Hudson: "Re: Tamil glyphs"
Previous message: toby_phipps@peoplesoft.com: "Lost & Found from IUC17 San Jose"
Maybe in reply to: Lars Marius Garshol: "Counting characters or bytes in UTF-8?"
Next in thread: addison@inter-locale.com: "Re: Counting characters or bytes in UTF-8?"
Reply: addison@inter-locale.com: "Re: Counting characters or bytes in UTF-8?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> 2. The original intent of strncpy() was to provide a means of copying both
> bytes and characters. Since the assumption was 1 byte == 1 char, there was
> no problem with this. In addition to the problem in #1, though, UTF-8
> introduces these issues:

I've always looked at the strxxx() functions as manipulating characters
(strings of), and the memxxx() ones (memcpy, memcmp, actually bxxx() in my
time) as manipulating bytes.

I would thus try to avoid having functions like uni_strncpy() handle
anything but characters. Unfortunately, character-oriented APIs in UTF-8 are
not paragons of performance, so it may be better to provide a byte-oriented
API and some way to get the byte offset of the nth character of string,
along with the opposite operation. I would then not call the function
uni_strncpy(), but maybe uni_bytescopy() or uni_memcpy(), to minimize
confusion.

Next message: John Hudson: "Re: Tamil glyphs"
Previous message: toby_phipps@peoplesoft.com: "Lost & Found from IUC17 San Jose"
Maybe in reply to: Lars Marius Garshol: "Counting characters or bytes in UTF-8?"
Next in thread: addison@inter-locale.com: "Re: Counting characters or bytes in UTF-8?"
Reply: addison@inter-locale.com: "Re: Counting characters or bytes in UTF-8?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT