From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sun Jun 13 2010 - 22:30:17 CDT
Steve,
When you design your encoding proposal, please bear in mind that even
for a language as well supported as English, it is generally not
possible to fully represent the semantic content (let alone the
appearance) with plain text alone.
Yes, 99% or so, of the semantic content can be encoded in plain text
alone, but some texts in English require the use of italics for
disambiguation (removing the emphasis will allow more than one choice of
how to read the text).
If you move one level up, to HTML, say, you can capture all these
documents, but also many others where styled text has a weaker semantic
role (headings, generic emphasis, etc.).
With CSS as a next level up you can express the author's choice of
appearance of the text for 99% of all documents, and the continuum
doesn't end there.
Your discussion of sign writing needs to encompass a role for higher
level protocols like HTML or CSS (or their effective equivalents for
other types of writing, such as MathML). Not everything needs to be
carried at the plain text level, but everything needs to be expressible
at a suitable level.
Sometimes, a higher level protocol requires the availability of
"building blocks" that may not make sense in the context of a plain text
"stream", but that together with the higher level protocol allow an
efficient representation of the notation. The mathematical alphabetics
or the musical notation elements are encoded in Unicode for such use
with higher level protocols.
From the way you describe the requirements (faithfully representing the
minutest details of the authors choice of placements, etc.) and your
claim that the plain text level should not / does not encode semantic
contents, I get the impression that you have not fully thought through
what information should be represented at what level of the text
architecture.
The name "Character Glyph Model" hides the complexity, and the layers of
real world text and data architectures into which Unicode must fit.
If a plain text cannot hope to encode at least a basic representation of
a notation (as in music, or for all but the most trivial mathematical
notation) then the precedent has been to try to abstract the semantic
contents so that it is available for data procession (searching,
sorting, etc.) in the plain text layer, while the description of the
visible text in these cases requires the use of a higher level protocol
where notions of placement etc. can be expressed succinctly.
Concretely: do you see the need for, existence of a SignWritingML? Do
you think, existing HTML could correctly render SignWriting if that was
presented as part of the plain text data (under your proposal)? What
would the role be for CSS? What happens when a user agent selects a
different font, because the one the author used is unavailable on the
system used by the reader?
In some of your answers you've given a few hints, but for someone like
me who has no firsthand experience of signing and difficulties
visualizing sign writing, you probable will want to be way more explicit
and concrete in your description and examples, so that it becomes
possible to evaluate whether your choices in the encoding model are the
correct ones, or possibly the only ones, or whether, on the contrary,
the represent an unnecessary departure from the way Unicode deals with
non-linear notations.
A./
This archive was generated by hypermail 2.1.5 : Sun Jun 13 2010 - 22:33:35 CDT