From: Mark Leisher (mleisher@crl.nmsu.edu)
Date: Thu Apr 06 2006 - 14:04:09 CST
Tay, William wrote:
> Hi,
>
> I have a C/C++ UNIX application that uses standard UTF-8 as the internal
> text encoding. If it receives a UTF-8 encoded decomposed accented
> character, i.e. base character + accent, from a MacOS X application, it
> would need to be able to detect that the character was decomposed, and
> then compose it prior to further processing. Is there any Solaris/UNIX
> utility or functions that can help my application do the detection and
> character composition?
>
> Now, the application from which the decomposed accented character
> originated may query my application so that the character is returned to
> it. If my application has already composed the character, won't it be a
> problem for the querying application, since it expects to receive the
> character in its decomposed format?
>
> My application interacts with not only MacOS X application but others
> that sit on different platforms. So, I'm not always receiving accented
> characters in their decomposed format.
>
> How do you think I should implement my application so that it takes care
> of decomposed and composed UTF-8 characters effectively?
>
> Can accented characters be decomposed in other encodings, e.g. ISO
> 8859-1, as well?
>
> Btw, what common applications/operating systems generate decomposed
> accented characters?
>
You can play with http://crl.nmsu.edu/~mleisher/ucdata.html. Version 2.9
does not have composition/decomposition for UTF-8 strings, but version
3.0 will be released soon (probably next few weeks), and it does have
support for UTF-8 composition/decomposition.
-- ------------------------------------------------------------------------ Mark Leisher Computing Research Lab They never open their mouths New Mexico State University without subtracting from the Box 30001, MSC 3CRL sum of human knowledge. Las Cruces, NM 88003 -- Thomas Bracket Reed (1839-1902)
This archive was generated by hypermail 2.1.5 : Thu Apr 06 2006 - 14:17:42 CST