Re: ANSI and Unicode for x00 - xFF

From: Chris Jacobs (chris.jacobs@xs4all.nl)
Date: Wed Oct 26 2005 - 12:41:05 CST

  • Next message: Kenneth Whistler: "Re: ANSI and Unicode for x00 - xFF"

    Velasquez, Carlos wrote:
    > Hello All,
    > I am new to this list and somewhat new to the Unicode standard. I am
    > hoping someone can help me understand the difference between ANSI and
    > UTF-8 for characters in the domain of x00 and xFF.
    >
    > Are the 7 bit ASCII characters a subset of the 8 bit ANSI character?
    > I understand that the 7 bit ASCII characters are definitely a subset
    > of the UTF-8 set but am not sure if ANSI is a subset of UTF-8.
    >
    > Here is why I ask:
    > Our database contains name information for a Spanish population. As
    > such, we store names such as "Sérgio Murilo" in our database which is
    > set to Unicode UTF-8. However, when we generate files and specify the
    > file encoding to be ANSI, we get the character "é" in double byte
    > (xC3 and xA9). But looking at the ANSI set, "é" is defined as xE9.
    >
    > Wouldn't they be one and the same? and in single byte?

    It is not a difference between ANSI and unicode but between unicode code
    points and the UTF-8 encoding scheme.

    The unicode code point for é is indeed the same, as can be seen here:

    é 233 233 0xE9 U+00E9 é
    Latin small letter e with acute Latin-1 Supplement
    http://www.alanwood.net/demos/ansi.html

    For the difference between code points and UTF-8 see sections 2.5 and 2.6 of
    chapter 2 of the standard.
    online here: http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf

    >
    > Thank you for any help you may have!
    >
    > Regards,
    > Carlos Velasquez



    This archive was generated by hypermail 2.1.5 : Wed Oct 26 2005 - 12:45:08 CST