From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Fri Jan 26 2007 - 12:16:24 CST
On 1/25/2007 1:13 PM, Ruszlan Gaszanov wrote:
> There's one thing I don't quite understand. Why do we keep encoding variations and combinations (ligatures) of the same base letters from Latin, Greek, Cyrillic end similar scripts as separate code points, when mechanisms already exist to compose them from base characters?
>
> Consider +A8Y- <U+ACs-03C6> (GREEK SMALL LETTER PHI) and +A9U- <U+ACs-03D5> (GREEK PHI SYMBOL) for instance. <U+ACs-03D5> only exists to enforce "straight" glyph in mathematical context. Wouldn't it be more sensible to apply, let's say VS1 <U+ACs-FE00> to <U+ACs-03C6> to enforce "loopy" glyph and VS2 <U+ACs-FE01> to enforce "straight" glyph where distinction is important, while leaving it to the font designer to chose the glyph for pain "VS-less" <U+ACs-03C6>.
>
I think you first of all have to consider that whether you use a single
character code or a sequence, it's an encoding. In case of using
variation selectors, you've made the encoding more complicated.
The use of VS is appropriate when the distinction between two shapes
isn't needed by all users, and when treating the character as if the VS
isn't there is appropriate for most processes (in fact, all but display).
This is largely true for the Mongolian FVS, where text processes other
than display just act on the base character. It is true for those math
symbols for which we designated VS sequences - as far as we've
established, they mean the same thing, and ignoring the VS in
non-display processing is correct.
The Greek letterforms used in math are different. They do mean different
things, and treating them in text processes (search) as if they are the
same is definitely not correct. In Greek text, the character codes for
letter forms should not be used, and fonts should supply whatever glyph
they desire for the standard character code.
(That makes some fonts unusable for math, but that's OK).
> Or, let's take all those spacing/combining subscript/superscript forms and so-called "mathematical alphabets" - couldn't the same thing have been accomplished by specific VS?
No, because then the VS would represent meaning. It's just the same
reason why we don't have a VS for uppercase letters, but code both upper
and lower case separately. I do want to be able to search for vector 'a'
(bold) as distinct from factor a (italic) in a mathematical text. I
don't care that both are based on the first letter of the Latin
alphabet, so I don't want the search to find the word 'a'.
Using VS is the wrong choice, because it implies that situations where I
want to ignore the distinction. and where I need to relate to the base
character, are common.
The rules for using characters and character shapes in written languages
on the one hand and technical and scholarly notations on the other are
different. Unicode correctly reflects that distinction.
A./
This archive was generated by hypermail 2.1.5 : Fri Jan 26 2007 - 12:20:24 CST