Hi Folks,
I have heard it stated that, in the context of character encoding and decoding:
Interoperability is getting better.
Do you have data to back up the assertion that interoperability is getting better?
Below is a summary of my understanding of interoperability. Would you inform me of any misunderstandings please?
-------------------------------------------------------------------------------
Interoperability of Text (i.e., Character Encoding Interoperability)
-------------------------------------------------------------------------------
Remember not long ago you would visit a web page and see strange characters like this:
“Good morning, Daveâ€
You don't see that anymore.
Why?
The answer is this:
Interoperability is getting better.
In the context of character encoding and decoding, what does that mean?
Interoperability means that you and I interpret (decode) the bytes in the same way.
Example: I create text file, encode all the characters in it using UTF-8, and send the text file to you.
Here is a graphical depiction (i.e., glyphs) of the bytes that I send to you:
López
You receive my text document and interpret the bytes as iso-8859-1.
In UTF-8 the ó symbol is a graphical depiction of the "LATIN SMALL LETTER O WITH ACUTE" character and it is encoded using these two bytes: C3 B3
But in iso-8859-1, the two bytes C3 B3 is the encoding of two characters:
C3 is the encoding of the à character
B3 is the encoding of the ³ character
Thus you interpret my text as:
López
We are interpreting the same text (i.e., the same set of bytes) differently.
Interoperability has failed.
So when we say:
Interoperability is getting better.
we mean that the number of incidences of senders and receivers interpreting the same bytes differently is decreasing.
Let's revisit our first example. You go to a web site and see this:
“Good morning, Daveâ€
Here's how that happened:
I use Microsoft Word (character set, Windows-1252) to create a web page containing this text document:
“Good morning, Dave”
Notice that I wrapped the greeting in Microsoft smart quotes.
You visit my web page.
Suppose your browser is set to interpret all web pages as iso-8859-15.
In Windows-1252 the left smart quote is hex: 93
In Windows-1252 the right smart quote is hex: 84
In iso-8859-15 there are no characters assigned to either hex 93 or hex 84.
So your browser replaces the left smart quote (hex 93) with hex E2 (â) followed by hex A4 (€) followed by hex BD (œ).
And your browser replaces the right smart quote (hex 84) with hex E2 (â) followed by hex A4 (€).
The result is that you see this on your browser screen:
“Good morning, Daveâ€
Received on Sun Dec 30 2012 - 15:29:41 CST
This archive was generated by hypermail 2.2.0 : Sun Dec 30 2012 - 15:29:44 CST