From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Fri Jun 24 2005 - 16:06:14 CDT
Gregg's analysis is not something we should follow in the Unicode Standard
(I think. I'm having a hard time following it, actually).
HTML is a representation of rich text expressed in a plain text format.
When you view and edit HTML source, you are accessing it as plain text.
However, the information described by HTML is rich text, which in
turn consists of the stuff between the ">' and "<" to which information
(the "<" and "> and what they enclose) has been added.
Therefore, when presented to an HTML parser, HTML is decidedly *not* plain
text.
If I take any other (binary) rich text format, and dump it with a binary
editor into a string of 2-digit ASCII hex-numbers, that does turn it into
a plain text serialization of the information, but does not turn the
text contained in the rich text into plain text.
The point is that the HTML source is not the same as the HTML text,
even though there are related (by the HTML protocol).
Syntax coloring and content driven styles are even more of a red-herring
in this context.
By the way, the point is well taken that an encoding that encodes text
formatting information on the same level as character codes does not
represent plain text. In other words, if there was a unique character
code for each element (<p> etc.) or attribute in HTML, that would not
make HTML be plain text, any more than writing it in ASCII.
A./
This archive was generated by hypermail 2.1.5 : Fri Jun 24 2005 - 16:06:41 CDT