From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Jul 05 2005 - 14:40:18 CDT
> Mark E. Shoulson wrote:
> > Gregg Reynolds wrote:
> >
> >> ... but I would observe that Unicode is
> >> capable of accomodating e.g. bidi-override marks and various similar
> >> "characters"; so why not a <subscript> and <popsubscript> mark, for
> >> example.
And the answer to that is that unlike the bidirectional case, the
introduction of subscript formatting operators into Unicode would
do nothing whatsoever to improve upon existing text representation,
and would, in fact, have the contrary impact of introducing confusion
and ambiguity into text.
> >>
> > We could call them "[" and "]", for consistency with existing practice.
> > This is a typesetting issue, not a plaintext one.
> >
> I'm not sure I understand what you mean. "[" and "]" have well-defined
> plaintext semantics. If we overlay "subscript" on those semantics, then
> we no longer have plaintext. Nor do we have markup; we have a
> redefinition of the codepoints.
Actually, none of the above. What we have is different conventions
for usage of (plain) text.
The text "b[i]", represented (in Unicode) by the plain text
sequence <0062, 005B, 0069, 005D>, may, in one context, with
one set of textual conventions, be interpreted as "the i-th element
of the array b". In another context, with another set of
textual conventions, it may be interpreted as "the word 'bi',
with the existence of the letter 'i' inferred, although missing
in the original epigraph (or other physical source)."
In the first context, a commonly used typographical convention
is to represent the same concept with an math italic b and a
subscripted i. In the second context, that presentation would
not be equivalent. At any rate, there is no presumption that
the mathematical presentation should be automatically derived
from plain text without the imposition of *some* level of
markup as well -- since mathematical typesetting is, in general,
complex.
> Also I don't understand the distinction
> between "typesetting issue" and "plaintext issue". Plaintext must be
> typeset.
Plain text must be *rendered* legibly to be read (at least in anything
other than a debugger). That does not mean that all typographical
issues for text become, ipso facto, plain text issues.
There are issues in the representation of text that fall outside
the realm generally considered to be appropriate to the encoding
of the plain text elements represented by "characters" in the
Unicode Standard.
And there is general consensus now within the character encoding
community that superscripting and subscripting falls outside
that boundary. The exceptions you see in the Unicode Standard
are either a) for legacy compatibility with older characters
sets or b) special purpose characters for representation of
letters in technical phonetic representations.
--Ken
This archive was generated by hypermail 2.1.5 : Tue Jul 05 2005 - 14:42:23 CDT