RE: Unicode, UTF-8 and Extended 8-Bit Ascii - Help Needed

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Tue Jul 10 2001 - 12:16:56 EDT


Stephen,

Only the 7 bit ASCII characters are the same. UTF-8 encodes characters so
that you can tell how many bytes the character will take from the value of
the first byte of the character.

0x00 - 0x7F = 1 byte
0xC0 - 0xDF = 2 bytes
0xE0 - 0xEF = 3 bytes
0xF0 - 0xF7 = 4 bytes

0x80 - 0xBF is used for continuation bytes. These are the 2nd, 3rd or 4
bytes of a character. This make it easy to find the beginning of a
character.

0xF8 - 0xFF are not used by Unicode.

Carl

> -----Original Message-----
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
> Behalf Of Stephen Cowe - Sun Scotland
> Sent: Tuesday, July 10, 2001 3:53 AM
> To: unicode@unicode.org
> Subject: Unicode, UTF-8 and Extended 8-Bit Ascii - Help Needed
>
>
> Hi Unicoders,
>
> I am new to the list and would be really grateful if you could
> help me out here.
>
> I am trying to discover if the "extended latin" 8-bit ascii (decimal
> values 128-255, Hex A0-FF), i.e. ISO-8859-1 are supported by UTF-8, and
> if so, are the values the same.
>
> The reason why I am asking this is because our EDIFACT EDI system
> requires to send extended latin European characters (using the
> UNOC version 3
> syntax identifier) and our global internal messaging system is
> being converted
> to UTF-8.
>
> I have had a good search of the Unicode web-site but do not seem
> to be able to
> find the answer, yes or no, that I require.
>
> I look forward to hearing from you, kind regards,
>
> Stephen Cowe.
>
> eCommerce Technologist
> GSO IT EDI/EDE
> +44 (0)1506 672541 (Tel)
> +44 (0)1506 672893 (Fax)
> stephen.cowe@sun.com
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 11:11:26 EDT