From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sun Apr 02 2006 - 05:05:48 CST
James Kass wrote:
>> When a new orthography was announced for German a few years ago,
>> did you go and make two Latin fonts then, one for the old and one for
>> the new orthography? I guess (and hope) not... When one for Finnish
>> started to use ? and ? instead of sh and zh, did you go and make a
>> font that displays sh as ? and zh as ?? I guess and hope not.
>
> Of course not. I've always figured that if anybody wants to represent the
> "sh" sound with a question mark, they should just
> use the question mark character at U+0037.
>
> (My browser settings munged your message.)
No! The question mark U+0037 has inappropriate properties for a letter.
Compare LATIN LETTER RETROFLEX CLICK (alias LATIN LETTER EXCLAMATION MARK)
U+01C3 with the EXCLAMATION MARK U+0021. They have different BiDi
properties for starters. (I'm not sure if the dire effects on
spell-checkers of using puntuation as letters can be blamed on the Unicode
properties. One of the Unicode annexes agonises over the apostrophe U+0027.
>> "Uniqueness Rule"???
>
> "Two different encodings should not render same,
> irrespective of the font or joiners used."
>
> http://varamozhi.blogspot.com/2005/07/unicode-uniqueness-rule-on-encoding.html
That's at best a goal. There are blocks of exceptions, e.g. Arabic
Presentation Forms! At best it can be rescued by adding 'unless they are
compatibility equivalent'. If I write U+0061 U+200D U+0065 I have no idea,
without knowing the rendering system, whether I will get the same as U+0061
U+0065, the same as U+00E6 LATIN SMALL LETTER AE or something different.
The rule would eliminate the second possibility.
There are also cases where identical glyphs have been created without any
qualms - the principle of script separation distinguishes the usually
visually identical LATIN SMALL LETTER O, CYRILLIC SMALL LETTER O and GREEK
SMALL LETTER OMICRON without serious worries, though I must admit I found a
(hand-drawn) diagram with both LATIN CAPITAL LETTER M and GREEK CAPITAL
LETTER MU distinctly naughty. (The contrast seemed to be totally oral.)
The use of IPA in orthographies also creates havoc. The glyphs of LATIN
SMALL LETTER ALPHA U+0251 are also glyphs of LATIN SMALL LETTER A U+0061,
and are the glyphs usually used in children's books in England. There are
also cases where glyph variation is constrained by grammatical
considerations.
Richard.
This archive was generated by hypermail 2.1.5 : Sun Apr 02 2006 - 05:15:54 CST