a character for an unknown character
charupdate at orange.fr
Tue Dec 27 10:03:44 CST 2016
On 27/12/16 01:11, Richard Wordingham wrote:
> On Sun, 25 Dec 2016 19:31:28 +0200
> "Jukka K. Korpela" wrote:
> > If some graphic symbol is by convention used to represent a lacuna,
> > then the issue, as regards to Unicode, is simply whether that symbol
> > exists as an encoded character or whether there is need to add that
> > graphic symbol to Unicode. But it would be a matter of encoding
> > graphic characters (irrespectively of their meaning in some content),
> > not about encoding abstract ideas like “an unrecognized character”.
> Unicode encodes pictograms, directives and abstract characters, not
> glyphs. There are few, if any characters, that have no semantics,
> though several characters can be ambiguous and context-sensitive as to
> what semantics they occur. If it was just a matter of appearance,
> then U+26C6 RAIN would be the character to use. It has the graphic
> used for characters in damaged inscriptions.
As far as my todayʼs understanding of Unicode goes, I believe that the
“not encode glyphs but abstract characters” principle has a counterpart
that makes Unicode characters polysemic by design, as results from
TUS 3.3, D2. This compromise led to abandon the initially considered
extensive disunification policy in favor of reasonable unifications that
provided a correct benefit-cost ratio, Mark Davis explained on this List:
TUS 3.2, C4 and C5 (Conformance Requirements: Interpretation) seems to me
to be specifying that the meanings of a given character are free and may be
defined by any human convention, provided that they donʼt conflict with
the Unicode character properties of that character.
> Of course, there is one character that is already widely used in this
> rôle - U+003F QUESTION MARK. Some of its Unicode properties are not
> suitable, and its informal 'unknown character' semantic conflicts with
> its rôle as a punctuation mark.
Effectively this use of QUESTION MARK is a plague that messes up almost
every Unicode string dropped into an ANSI-encoded document.
The only reason I can see for its use is that amidst the ASCII characters,
this is the one that comes closest to the intended meaning.
RAIN seems to me best fit for the discussed usage, and I canʼt see any
problem in using it with this semantics. If Iʼm wrong, how about this:
U+25A8 SQUARE WITH UPPER RIGHT TO LOWER LEFT FILL
> If I understand correctly, these issues are already addressed by the
> Leiden Conventions. Why do they not suffice?
I believe that they work well in historic texts that donʼt use the specified
meta language characters. The Leiden Conventions could be settled because
brackets and parentheses arenʼt found in old sources. Perhaps modern ones
that do use these characters are never damaged and to be restored this way.
On the other hand, editors might wish to avoid mixing ASCII characters into
original scripts. So the RAIN pictograph may be neutral enough.
If so, the Leiden Conventions could eventually be extended to include it.
More information about the Unicode