Re: Decomposed vs Composed accented characters

From: Mark Leisher (mleisher@crl.nmsu.edu)
Date: Thu Apr 06 2006 - 14:04:09 CST

Next message: Richard Wordingham: "Re: The Phaistos Disc"

Previous message: Tay, William: "Decomposed vs Composed accented characters"
In reply to: Tay, William: "Decomposed vs Composed accented characters"
Next in thread: Mike Ayers: "Re: Decomposed vs Composed accented characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Tay, William wrote:
> Hi,
>
> I have a C/C++ UNIX application that uses standard UTF-8 as the internal
> text encoding. If it receives a UTF-8 encoded decomposed accented
> character, i.e. base character + accent, from a MacOS X application, it
> would need to be able to detect that the character was decomposed, and
> then compose it prior to further processing. Is there any Solaris/UNIX
> utility or functions that can help my application do the detection and
> character composition?
>
> Now, the application from which the decomposed accented character
> originated may query my application so that the character is returned to
> it. If my application has already composed the character, won't it be a
> problem for the querying application, since it expects to receive the
> character in its decomposed format?
>
> My application interacts with not only MacOS X application but others
> that sit on different platforms. So, I'm not always receiving accented
> characters in their decomposed format.
>
> How do you think I should implement my application so that it takes care
> of decomposed and composed UTF-8 characters effectively?
>
> Can accented characters be decomposed in other encodings, e.g. ISO
> 8859-1, as well?
>
> Btw, what common applications/operating systems generate decomposed
> accented characters?
>

You can play with http://crl.nmsu.edu/~mleisher/ucdata.html. Version 2.9
does not have composition/decomposition for UTF-8 strings, but version
3.0 will be released soon (probably next few weeks), and it does have
support for UTF-8 composition/decomposition.

-- 
------------------------------------------------------------------------
Mark Leisher
Computing Research Lab              They never open their mouths
New Mexico State University         without subtracting from the
Box 30001, MSC 3CRL                 sum of human knowledge.
Las Cruces, NM  88003                 -- Thomas Bracket Reed (1839-1902)

Next message: Richard Wordingham: "Re: The Phaistos Disc"
Previous message: Tay, William: "Decomposed vs Composed accented characters"
In reply to: Tay, William: "Decomposed vs Composed accented characters"
Next in thread: Mike Ayers: "Re: Decomposed vs Composed accented characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Apr 06 2006 - 14:17:42 CST