From: Martin Heijdra (mheijdra@princeton.edu)
Date: Mon Dec 16 2002 - 09:33:10 EST
Andrew:
A small group has been working on these and other questions for a while now,
after the last group of questions raised on Mongolian on this list. I will
get in contact with you separately with some of our work.
For the moment, in short: yes, use the TR170 document, especially its
detailed examples (which are fuller than the textual explanations, and have
implications not explicitly stated); there is a Chinese book called
Mengguwen bianma which at parts is fuller and more explicit. There are still
some rare cases not covered by either.
Martin Heijdra
----- Original Message -----
From: "Andrew C. West" <andrewcwest@alumni.Princeton.EDU>
To: <unicode@unicode.org>
Sent: Monday, December 16, 2002 8:40 AM
Subject: Mongolian Encoding
> As promised, here are some questions on the encoding of Mongolian that
have
> arisen whilst writing an input method for the Mongolian script (the
questions
> are relevant to the Todo, Manchu and Sibe scripts as well, but I'll
restrict
> myself to Mongolian for the moment). I don't know if anyone is able to
answer
> all of my questions, but I hope that someone on the list will be able to
give me
> some much needed advice.
>
> 1. Documentation
> Section 11.4 of the Unicode Standard notes that a group of experts from
> Mongolia, China and the West are to publish a document called "User's
Convention
> for System Implementation of the International Standard on Mongolian
Encoding"
> which will explicitly define Mongolian character shaping behaviour in
full. WG2
> document N1980 (http://std.dkuug.dk/jtc1/sc2/WG2/docs/n1980.doc) also
states
> that Mongolian, Chinese and English versions of the "User's Convention"
will be
> prepared by Mongolia and China. I have been unable to locate this document
on
> the internet. Does it exist, and if so can it be made publicly available ?
> Without the aid of such a document it seems almost impossible to correctly
> implement the Unicode encoding of Mongolian.
> In its stead I have been using the document "Traditional Mongolian Script
in the
> ISO/IEC 19646 and Unicode Standards" (UNU/IIST Report No. 170, August
1999)
> written by Myatav Erdenechimeg, Richard Moore and Yumbayar Namsrai as a
guide to
> Mongolian character shaping behaviour. It seems to provide all the
information I
> would expect to see in the "User's Convention", but I am not sure how
> authoritive this paper is, and what its relationship is to the "User's
> Convention" (if any).
>
> 2. Free Variation Selectors
> The Mongolian Free Variation Selectors (U+180B, U+180C and U+180D) are
used to
> distinguish variant graphic forms of the same positional forms of a
character. I
> would say that there are three cataegories of variant forms governed by
the
> variation selectors :
> A. Non-contextual variants, such as variant forms of letters that are used
in
> foreign words (e.g. the use of a "reclining" letter D -- U+1833 + FVS1 --
in
> foreign words), and graphic variations that are due to differences between
> traditional and modern orthography. Such variants must be explicitly
encoded by
> use of the appropriate variation selector in order for the correct form to
be
> selected by the rendering engine.
> B. Contextual variants that are determined by the overall composition of
the
> word in which they are found, such as the use of the long-toothed forms of
the
> letters OE and UE (U+1825/1826 + FVS1) in the first syllable of a word
only, or
> the use of the feminine form of the letter G (U+182D + FVS3) between
consonants
> or the letter I (which is neutral) in a feminine word. In these cases I
would
> imagine that it is too much to ask the rendering engine to work out the
correct
> variant form, and the correct variant should be explicitly encoded using
the
> appropriate variation selector.
> C. Contextual variants that can be determined from their neighbouring
letters,
> such as the medial form of the letter G with two dots that is used before
a
> vowel (U+182D + FVS2), or the form of the letter A that is written with a
> forward tail when occuring finally after the letters B, P, F and K (U+1820
+
> FVS1). In these cases is it necessary to explicitly encode the variant
form with
> the appropriate variation selector ? The Standard says "For cases in which
the
> contextual sequence of basic letters is not sufficient for a rendering
engine to
> uniquely determine the appropriate glyph for a particular letter,
additional
> format characters are provided so that the typist may specify the desired
> rendering". Should we assume that the rendering engine will correctly
select the
> dotted form of medial G before a vowel and the dotless form before a
consonant,
> or would it be wiser to explicitly encode the appropriate variation
selector
> anyway ?
>
> 3. Mongolian Vowel Selector
> The Mongolian Vowel Selector (U+180E) is used to separate the vowels A and
E
> from certain preceding consonants (e.g. ...N + MVS + A =
U+1828,180E,1820 ).
> After MVS the vowels A and E use the forward tail variant which is
physically
> offset from the preceding consonant by narrow whitespace. These variant
forms of
> A and E are selected by the presence of a preceding MVS, and there appears
to be
> no need to to otherwise select the variant A or E by means of a variation
> selector.
> However, not only does the MVS affect the following A or E, but the
preceding
> consonant may also take a variant form when followed by an offset A or E.
This
> is the case for the letters N, Q, G, J, Y and W. The variant forms of
these
> letters when preceding an offset A or E are given in Unicode's
Standardized
> Variants document (N, Q, G, J and Y are given as medial variants, but W is
given
> as a final variant which is perhaps wrong). My question is, should the
variant
> form of the consonant preceding the offset A or E be explicitly encoded
using
> the appropriate variation selector, or is the presence of the following
MVS
> sufficient for the rendering engine to select the correct variant form ?
>
> 4. Variant forms of the Mongolian Birga
> Appendix A of "Traditional Mongolian Script in the ISO/IEC 19646 and
Unicode
> Standards" lists four variant forms of the Mongolian Birga (U+1800) :
> 1st variant form = U+1800 + FVS1
> 2nd variant form = U+1800 + FVS2
> 3rd variant form = U+1800 + FVS3
> 4th variant form = U+1800 + ZWJ
>
> Unicode's Standardized Variants document
> (http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html) does not
list
> any variants for the Mongolian Birga. Moreover, it warns "All combinations
not
> listed here are unspecified and are reserved for future standardization;
no
> conformant process may interpret them as standardized variants." This
clearly
> means that these Birga variants should not currently be recognised. But
given
> that the Birga does occur in a number of forms, either Unicode should
define standardized
> variants for them, or add some new characters to represent them.
> Nevertheless, assuming that Appendix A of "Traditional Mongolian Script"
is
> correct in providing a mechanism for distinguishing four variant forms of
the
> Mongolian Birga, is it acceptable to use the ZWJ as a variant selector (as
is
> the case for the 4th variant Birga) ? It's usage here seems a little
suspect to
> me.
>
> Andrew
>
>
This archive was generated by hypermail 2.1.5 : Mon Dec 16 2002 - 10:08:25 EST