Re: Does "endian-ness" apply to UTF-8 characters that use multiple bytes?

From: Asmus Freytag via Unicode <unicode_at_unicode.org>
Date: Mon, 4 Feb 2019 11:29:43 -0800
On 2/4/2019 11:21 AM, Costello, Roger L. via Unicode wrote:
Hello Unicode Experts!

As I understand it, endian-ness applies to multi-byte words.

Endian-ness does not apply to ASCII characters because each character is a single byte.

Endian-ness does apply to UTF-16BE (Big-Endian), UTF-16LE (Little-Endian), UTF-32BE and UTF32-LE because each character uses multiple bytes. 

Clearly endian-ness does not apply to single-byte UTF-8 characters. But what about UTF-8 characters that use multiple bytes, such as the character é, which uses two bytes C3 and A9; does endian-ness apply? For example, if a file is in Little Endian would the character é appear in a hex editor as A9 C3 whereas if the file is in Big Endian the character é would appear in a hex editor as C3 A9?

/Roger


UTF-8 is a byte stream. Therefore, the order of bytes in a multiple byte integer does not come into it.

A./

Received on Mon Feb 04 2019 - 13:29:53 CST

This archive was generated by hypermail 2.2.0 : Mon Feb 04 2019 - 13:29:53 CST