When to use markup: (Was:Introducing the idea of a "ROMAN VARIANT SELECTOR" (was: Re: Proposing Fraktur))

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Thu Jan 31 2002 - 16:09:46 EST


At 09:42 AM 1/30/02 +0100, Karl Pentzlin wrote:
>The question is, are typesetting rules "part of the script"?
>
>(I mean rules in the sense of obligatory regulations, not guidelines).

This distinction is a very German way of approaching the question.

>If yes, (in my opinion) the plain text must carry the information that is
>needed to follow them. If no, their execution can be left to higher level
>protocols (which then have to decide whether a word is a foreign word
>[to be set in Roman letters] or a name [to be set in Fraktur letters],
>such at least according to German typesetting rules).

A more productive distinction would be along these lines:

a) is the feature necessary for correctly expressing the content
b) is the feature rule based, and
b.1) is the rule implementable w/o knowledge of semantics, or
c) when implementing the feature, is it necessary to
c.1) provide scope information, or
c.2) is local context sufficient

Looking at this list, roughly in reverse order:

Higher level protocols, understood as "markup languages" in particular,
do really well, when implementing something requires defining a scope,
since in them, all text data and the effect of all syntax are scoped
already.

If layout features can be determined algorithmically, it makes little
sense to add what can be derived from the existing text data, also into
the markup. Allowing for duplicate representation of information, always
allows the possibility of something getting out of step.

If semantic knowledge is required to implement a feature, this knowledge
must be supplied. If the extra information can be expressed as point-like,
local context, then it makes much *less* sense to use higher level markup
compared to character codes. Character codes, in a way, provide the ideal
representation of point like context in a data stream.

Finally, we get back to the original argument. Whether a typesetting
rule (and by rule I mean both conventions and legislated rules) is
supported by information added to the plain text or not, does not depend
on whether a national authority promulgates it, or whether it just
represents the consensus of the users of the language.

If, in practice, such a rule can be ignored, yet not change the meaning
of the text, it's a good candidate for not being implemented via plain
text. However, this is not absolute:

Leaving out italics from a document can not only change the level of
emphasis, but for example in English, there are occasional circumstances
where the use of italics removes a possible ambiguity in interpreting
a sentence. Nevertheless (except for mathematics) italics were left to
a higher level protocol (style markup).

Overriding bad hyphenation, or bad line breaks, is supported by SHY and
NBSP, even though hyphenation is not required at all to express the
content of a text, nor would bad line breaks e.g. after "Dr." change
the meaning of the text.

In the latter two cases, character codes were added (fairly early) to
plain text, because using point-like context to support these very
common algorithms (hyphenation and linebreak) is an elegant solution,
while adding markup for the same purpose would be inelegant to the
extreme.

Like everything else in character encoding, there are shades of gray,
and levels of gradation, so not everything is clear cut. But recognizing
up front that character codes may legitimately serve the support of
algorithms, even where the feature implemented by the algorithm is
merely common, and not absolutely and minimally required, is useful.

A./



This archive was generated by hypermail 2.1.2 : Thu Jan 31 2002 - 15:24:42 EST