From: Deborah Goldsmith (goldsmit@apple.com)
Date: Mon Aug 23 2004 - 14:53:33 CDT
FYI, by far the largest source of text in NFD (decomposed) form in Mac
OS X is the file system. File names are stored this way (for historical
reasons), so anything copied from a file name is in (a slightly altered
form of) NFD.
Also, a few keyboard layouts generate text that is partly decomposed,
for ease of typing (e.g., Vietnamese).
Deborah Goldsmith
Internationalization, Unicode liaison
Apple Computer, Inc.
goldsmit@apple.com
On Aug 23, 2004, at 11:51 AM, Doug Ewell wrote:
> Problem with accented charactersWilliam Tay wrote:
>
>> Can anyone explain why an accented character is sometimes represented
>> as a base character plus its accent? For example, the utf-8
>> representation for é is 65 CC 81, which is the utf-8 representation
>> for e and the accent, instead of C3 A9? I find that this is how MacOS
>> X represents accented characters.
>
> The two characters U+0065 and U+0301 (é) are canonically equivalent to
> the single character U+00E9 (é). That is, the two-character combining
> sequence is supposed to be considered equivalent to the single
> precomposed character. Apparently MacOS X, or at least one application
> running under it, does use the combining sequence.
>
>> How can a C application that receives such utf-8 encoded characters
>> handle them correctly? Appreciate your comments.
>
> It must understand normalization. See TUS 4.0, section 5.6 for more
> information.
>
> -Doug Ewell
> Fullerton, California
> http://users.adelphia.net/~dewell/
>
>
>
This archive was generated by hypermail 2.1.5 : Mon Aug 23 2004 - 14:55:20 CDT