From: Chris Jacobs (chris.jacobs@xs4all.nl)
Date: Wed Oct 26 2005 - 12:41:05 CST
Velasquez, Carlos wrote:
> Hello All,
> I am new to this list and somewhat new to the Unicode standard. I am
> hoping someone can help me understand the difference between ANSI and
> UTF-8 for characters in the domain of x00 and xFF.
>
> Are the 7 bit ASCII characters a subset of the 8 bit ANSI character?
> I understand that the 7 bit ASCII characters are definitely a subset
> of the UTF-8 set but am not sure if ANSI is a subset of UTF-8.
>
> Here is why I ask:
> Our database contains name information for a Spanish population. As
> such, we store names such as "Sérgio Murilo" in our database which is
> set to Unicode UTF-8. However, when we generate files and specify the
> file encoding to be ANSI, we get the character "é" in double byte
> (xC3 and xA9). But looking at the ANSI set, "é" is defined as xE9.
>
> Wouldn't they be one and the same? and in single byte?
It is not a difference between ANSI and unicode but between unicode code
points and the UTF-8 encoding scheme.
The unicode code point for é is indeed the same, as can be seen here:
é 233 233 0xE9 U+00E9 é
Latin small letter e with acute Latin-1 Supplement
http://www.alanwood.net/demos/ansi.html
For the difference between code points and UTF-8 see sections 2.5 and 2.6 of
chapter 2 of the standard.
online here: http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf
>
> Thank you for any help you may have!
>
> Regards,
> Carlos Velasquez
This archive was generated by hypermail 2.1.5 : Wed Oct 26 2005 - 12:45:08 CST