From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Tue Jan 20 2004 - 07:13:37 EST
On Tue, 20 Jan 2004 00:36:54 -0800, Asmus Freytag wrote:
>
> Currently, Variation Selectors work only one way. You could 'force' one
> particular
> shape. Leaving the VS off, gives you no restriction, leaving the software free
> to give you either shape. W/o defining the use of two VSs you cannot 'force'
> the 'regular' shape.
Yes, I had forgotten this. Although in practice I would imagine that only the
most perverse font would use an unexpected glyph variant as the standard glyph
for a character. To go back to my simplistic example of the long s (which I hope
no-one is taking too seriously), I think that the user would be justified in
expecting an ordinary short s to be displayed for U+0073 in isolation, and I
doubt that many fonts would map a long s glyph directly to U+0073. Thus although
you cannot force the "regular" glyph shape you can force the font's default
glyph shape by the omission of a VS, and in most fonts the default glyph would
be the same as the "regular" Unicode code chart glyph.
> Also, the way most VSs are defined, their use does not depend
> on context the same way as the example suggests.
>
Absolutely. My understanding is that the Mongolian Free Variation Selectors (and
the hypothetical long-s FVS) function quite differently from the ordinary
variation selectors currently used for mathematical symbols, and proposed for
Phags-pa, and apparently coming soon for Han ideographs. In the case of
Mongolian the rendering system can determine the expected glyph form based on a
set of deterministic rules, and so an FVS needs only be applied when the rules
need to be broken. On the other hand, there are no rules that allow the
rendering system to know which particular Standardised Variant glyph form to use
for an unmarked Unicode character in a particular context, and the VS must be
applied manually by the user or IME.
My understanding of under what circumstances standard variation selectors are a
good idea is typified by the four proposed Phags-pa standardised variants :
A85B FE00 -- PHAGS-PA LETTER YA with rounded appearance
A860 FE00 -- PHAGS-PA LETTER HA without tail kink
A864 FE00 -- PHAGS-PA LETTER FA with tail kink
A85E FE00 -- PHAGS-PA LETTER SHA with sloping stroke
These are glyph variants of Phags-pa letters that are used with semantic
distinctiveness in a single (but very important) text, _Menggu Ziyun_ , a 14th
century rhyming dictionary of Chinese in which Chinese ideographs are listed by
their Phags-pa spellings. In this one text only, variant forms of the letters
FA, SHA, HA and YA are used contrastively in order to represent historical
phonetic differences between Chinese syllables that were pronounced the same in
early 14th century standard Chinese (Old Mandarin). For example :
A. The ideographs SHU [U+66F8] and SHU [U+6B8A] were pronounced the same
in Old Mandarin, but were historically distinct (in the Chinese of the Tang
dynasty), the former with a reconstructed [U+0255] initial, the latter with a
reconstructed [U+0291] initial. In _Menggu Ziyun_ the former SHU is spelled
sheeu and the latter SHU spelled sh'eeu (where sh' is a glyph variant of sh).
B. The ideographs YIN [U+56E0] and YIN [U+5BC5] were pronounced the same
in Old Mandarin (other than tone which is not represented in Phags-pa spelling
of Chinese), but were historically distinct, the former with a reconstructed
null initial, the latter with a reconstructed [j] initial. In _Menggu Ziyun_ the
former YIN is spelled yin and the latter YIN spelled y'in (where y' is a glyph
variant of y).
C. The ideographs XIAN [U+96AA] and XIAN [U+5ACC] were pronounced the same
in Old Mandarin (other than tone), but were historically distinct, the former
with a reconstructed [x] initial, the latter with a reconstructed [U+0263]
initial. In _Menggu Ziyun_ the former XIAN is spelled hyem and the latter XIAN
spelled h'yem (where h' is a glyph variant of h).
D. The ideographs FANG [U+65B9] and FANG [ [U+623F] were pronounced the same
in Old Mandarin (other than tone), but were historically distinct, the former
with a reconstructed [p] initial, the latter with a reconstructed [b] initial.
In _Menggu Ziyun_ the former FANG is spelled fang and the latter FANG spelled
f'ang (where f' is a glyph variant of f).
However, in actual Phags-pa manuscript/printed texts and epigraphic inscriptions
there is no distinction between pairs of ideographs such as these, and the same
glyph form is used for all occurences of the letters FA, SHA, HA and YA
respectively.
Thus the Phags-pa letters FA, SHA, HA and YA represent "f", "sh", "h" and "y"
however they are written, but in one certain textual context glyph distinction
is used to carry additional historic phonetic information that you may or may
not want to preserve in electronic texts.
As Asmus says, "A VS approach is potentially indicated when its necessary to
manually select non-deterministic variants (or to override deterministic ones)
and at the same time it's desired to use the same base character code to carry
the same base meaning". I think that the proposed Phags-pa standardised variants
exactly meet these criteria.
Like others on this list I would like to hear more about the Han standardised
variants, as I'm more than a little uneasy about the use of variation selectors
to select simple glyph variants (e.g. traditional versus modern glyph forms)
that have no semantic distinctions, as would seem to be the case from what John
was saying. You can pretty much make the same case for needing to represent
glyph forms in plain text for any script, especially if you're an epigrapher or
textual scholar. What's so special about Han ideographs I wonder ?
Andrew
This archive was generated by hypermail 2.1.5 : Tue Jan 20 2004 - 08:59:12 EST