"Interoperability is getting better" ... What does that mean? from Costello, Roger L. on 2012-12-30 (Unicode Mail List Archive)

From: Costello, Roger L. <costello_at_mitre.org>
Date: Sun, 30 Dec 2012 21:22:12 +0000

Hi Folks,

I have heard it stated that, in the context of character encoding and decoding:

Interoperability is getting better.

Do you have data to back up the assertion that interoperability is getting better?

Below is a summary of my understanding of interoperability. Would you inform me of any misunderstandings please?

-------------------------------------------------------------------------------
Interoperability of Text (i.e., Character Encoding Interoperability)
-------------------------------------------------------------------------------
Remember not long ago you would visit a web page and see strange characters like this:

â€œGood morning, Daveâ€

You don't see that anymore.

Why?

The answer is this:

Interoperability is getting better.

In the context of character encoding and decoding, what does that mean?

Interoperability means that you and I interpret (decode) the bytes in the same way.

Example: I create text file, encode all the characters in it using UTF-8, and send the text file to you.

Here is a graphical depiction (i.e., glyphs) of the bytes that I send to you:

López

You receive my text document and interpret the bytes as iso-8859-1.

In UTF-8 the ó symbol is a graphical depiction of the "LATIN SMALL LETTER O WITH ACUTE" character and it is encoded using these two bytes: C3 B3

But in iso-8859-1, the two bytes C3 B3 is the encoding of two characters:

C3 is the encoding of the Ã character
B3 is the encoding of the ³ character

Thus you interpret my text as:

LÃ³pez

We are interpreting the same text (i.e., the same set of bytes) differently.

Interoperability has failed.

So when we say:

Interoperability is getting better.

we mean that the number of incidences of senders and receivers interpreting the same bytes differently is decreasing.

Let's revisit our first example. You go to a web site and see this:

â€œGood morning, Daveâ€

Here's how that happened:

I use Microsoft Word (character set, Windows-1252) to create a web page containing this text document:

“Good morning, Dave”

Notice that I wrapped the greeting in Microsoft smart quotes.

You visit my web page.

Suppose your browser is set to interpret all web pages as iso-8859-15.

In Windows-1252 the left smart quote is hex: 93

In Windows-1252 the right smart quote is hex: 84

In iso-8859-15 there are no characters assigned to either hex 93 or hex 84.

So your browser replaces the left smart quote (hex 93) with hex E2 (â) followed by hex A4 (€) followed by hex BD (œ).

And your browser replaces the right smart quote (hex 84) with hex E2 (â) followed by hex A4 (€).

The result is that you see this on your browser screen:

â€œGood morning, Daveâ€
Received on Sun Dec 30 2012 - 15:29:41 CST

This archive was generated by hypermail 2.2.0 : Sun Dec 30 2012 - 15:29:44 CST