Re: Mongolian Unicoding (was Re: Cuneiform Free Variation Selectors)

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Tue Jan 20 2004 - 07:13:37 EST

  • Next message: Deepak Chand Rathore: "how to download code pages in win2k/ nt"

    On Tue, 20 Jan 2004 00:36:54 -0800, Asmus Freytag wrote:
    >
    > Currently, Variation Selectors work only one way. You could 'force' one
    > particular
    > shape. Leaving the VS off, gives you no restriction, leaving the software free
    > to give you either shape. W/o defining the use of two VSs you cannot 'force'
    > the 'regular' shape.

    Yes, I had forgotten this. Although in practice I would imagine that only the
    most perverse font would use an unexpected glyph variant as the standard glyph
    for a character. To go back to my simplistic example of the long s (which I hope
    no-one is taking too seriously), I think that the user would be justified in
    expecting an ordinary short s to be displayed for U+0073 in isolation, and I
    doubt that many fonts would map a long s glyph directly to U+0073. Thus although
    you cannot force the "regular" glyph shape you can force the font's default
    glyph shape by the omission of a VS, and in most fonts the default glyph would
    be the same as the "regular" Unicode code chart glyph.

    > Also, the way most VSs are defined, their use does not depend
    > on context the same way as the example suggests.
    >

    Absolutely. My understanding is that the Mongolian Free Variation Selectors (and
    the hypothetical long-s FVS) function quite differently from the ordinary
    variation selectors currently used for mathematical symbols, and proposed for
    Phags-pa, and apparently coming soon for Han ideographs. In the case of
    Mongolian the rendering system can determine the expected glyph form based on a
    set of deterministic rules, and so an FVS needs only be applied when the rules
    need to be broken. On the other hand, there are no rules that allow the
    rendering system to know which particular Standardised Variant glyph form to use
    for an unmarked Unicode character in a particular context, and the VS must be
    applied manually by the user or IME.

    My understanding of under what circumstances standard variation selectors are a
    good idea is typified by the four proposed Phags-pa standardised variants :

    A85B FE00 -- PHAGS-PA LETTER YA with rounded appearance
    A860 FE00 -- PHAGS-PA LETTER HA without tail kink
    A864 FE00 -- PHAGS-PA LETTER FA with tail kink
    A85E FE00 -- PHAGS-PA LETTER SHA with sloping stroke

    These are glyph variants of Phags-pa letters that are used with semantic
    distinctiveness in a single (but very important) text, _Menggu Ziyun_ , a 14th
    century rhyming dictionary of Chinese in which Chinese ideographs are listed by
    their Phags-pa spellings. In this one text only, variant forms of the letters
    FA, SHA, HA and YA are used contrastively in order to represent historical
    phonetic differences between Chinese syllables that were pronounced the same in
    early 14th century standard Chinese (Old Mandarin). For example :

    A. The ideographs SHU [U+66F8] and SHU [U+6B8A] were pronounced the same
    in Old Mandarin, but were historically distinct (in the Chinese of the Tang
    dynasty), the former with a reconstructed [U+0255] initial, the latter with a
    reconstructed [U+0291] initial. In _Menggu Ziyun_ the former SHU is spelled
    sheeu and the latter SHU spelled sh'eeu (where sh' is a glyph variant of sh).

    B. The ideographs YIN [U+56E0] and YIN [U+5BC5] were pronounced the same
    in Old Mandarin (other than tone which is not represented in Phags-pa spelling
    of Chinese), but were historically distinct, the former with a reconstructed
    null initial, the latter with a reconstructed [j] initial. In _Menggu Ziyun_ the
    former YIN is spelled yin and the latter YIN spelled y'in (where y' is a glyph
    variant of y).

    C. The ideographs XIAN [U+96AA] and XIAN [U+5ACC] were pronounced the same
    in Old Mandarin (other than tone), but were historically distinct, the former
    with a reconstructed [x] initial, the latter with a reconstructed [U+0263]
    initial. In _Menggu Ziyun_ the former XIAN is spelled hyem and the latter XIAN
    spelled h'yem (where h' is a glyph variant of h).

    D. The ideographs FANG [U+65B9] and FANG [ [U+623F] were pronounced the same
    in Old Mandarin (other than tone), but were historically distinct, the former
    with a reconstructed [p] initial, the latter with a reconstructed [b] initial.
    In _Menggu Ziyun_ the former FANG is spelled fang and the latter FANG spelled
    f'ang (where f' is a glyph variant of f).

    However, in actual Phags-pa manuscript/printed texts and epigraphic inscriptions
    there is no distinction between pairs of ideographs such as these, and the same
    glyph form is used for all occurences of the letters FA, SHA, HA and YA
    respectively.

    Thus the Phags-pa letters FA, SHA, HA and YA represent "f", "sh", "h" and "y"
    however they are written, but in one certain textual context glyph distinction
    is used to carry additional historic phonetic information that you may or may
    not want to preserve in electronic texts.

    As Asmus says, "A VS approach is potentially indicated when its necessary to
    manually select non-deterministic variants (or to override deterministic ones)
    and at the same time it's desired to use the same base character code to carry
    the same base meaning". I think that the proposed Phags-pa standardised variants
    exactly meet these criteria.

    Like others on this list I would like to hear more about the Han standardised
    variants, as I'm more than a little uneasy about the use of variation selectors
    to select simple glyph variants (e.g. traditional versus modern glyph forms)
    that have no semantic distinctions, as would seem to be the case from what John
    was saying. You can pretty much make the same case for needing to represent
    glyph forms in plain text for any script, especially if you're an epigrapher or
    textual scholar. What's so special about Han ideographs I wonder ?

    Andrew



    This archive was generated by hypermail 2.1.5 : Tue Jan 20 2004 - 08:59:12 EST