From: Hans Aberg (haberg@math.su.se)
Date: Thu Feb 17 2005 - 13:00:26 CST
At 10:19 -0500 2005/02/17, Patrick Andries wrote:
> Antoine Leca a écrit :
>
> On Wednesday, February 16th, 2005 09:05Z Radovan Garabik va escriure:
>
> Just across a street from here, there is a travel agency, having a
>rather huge sign across their windows: "Preßburg reisen" - in all
>capitals, with ß being rather styled and blocky,
>
>Am I alone thinking this looks like a font issue?
>
> I would agree.
>
> PREßBURG is the equivalent of small caps for me of Preßburg. I believe Unicode
does not regulate small caps forms...
This hits a very interesting issue, the principles of adding characters to
Unicode. One would think that it should be that characters should be added
if they are semantically different, but not otherwise. For example, take the
word "sin". If it is in English, it will not change semantics if written in
say boldface. Therefore, English boldface letters should not be added to
Unicode. But now assume that "sin" is in math. Then changing to boldface
certainly alters the semantics, because of the math writing rules. So
boldface math letters should be added, just as has been done.
Now mix capitalization in the bag: In natural languages, capitalization
typically does not alter the semantics of the word. This is most apparent in
dictionaries and encyclopedias. For example, in Merriam-Webster, "Webster's
Third International Third New International Dictionary", all look-up words
are uncapitalized. In math, and in computer languages, capitalization
changes the semantics. So if an intended sentence is starting with such a
word, one is recommended to rewrite the sentence so that it does not start
with the word. For example, the sentence "sh uses..." might be rewritten as
"The shell sh uses...".
So, since capitalization does not alter the semantics of the word, it seems
that the capital letters should not be added at all to Unicode. However,
capitalization can be used to communicate certain semantic information:
Start of sentence, proper noun, (in German) noun, abbreviation, etc. If one
sticks to the semantic approach, then one should add abstract characters
"start of sentence", "proper noun", etc., zip out say the uppercase letters,
and let the rendering machine make a correct presentation.
But some of these uses are so ingrained that one for now must stick to a
mixed approach: Sometimes characters are separated based on semantic
differences, and sometimes based on glyph differences. If one should have a
character set based on semantics principles alone, then that would probably
require a great deal of work, and probably a wholly new character set,
designed from scratch. Then the old Unicode set members must be expressible
as sequences of this new character set.
Hans Aberg
This archive was generated by hypermail 2.1.5 : Thu Feb 17 2005 - 14:07:05 CST