Re: Chinese : Pin Yin characters

From: Tom Emerson (tree@basistech.com)
Date: Mon Nov 20 2000 - 15:46:12 EST


> Is there any set of characters for writing chinese's Pin Yin with
> the = differents tone ? I suppose so, I have the Unibook program and
> I have some difficulties to determine this set of characters.

The tone marks themselves can be found at

U+02C9 MODIFIER LETTER MACRON (1st tone)
U+02CA MODIFIER LETTER ACUTE ACCENT (2nd tone)
U+02C7 MODIFIER LETTER HACEK (3rd tone)
U+02CB MODIFIER LETTER GRAVE ACCENT (4th tone)

corresponding to the Big Five codepoints 0xA3BC, 0xA3BD, 0xA3BE, and
0xA3BF respectively. Big Five does not contain combined versions of
the vowels and tone marks.

GB 2312 does not include separate codepoints for the tones themselves
(or at least not a complete set: it does include U+FF40 FULLWIDTH
GRAVE ACCENT) though it does contain precomposed versions of the
vowels (a, e, i, o, u, and ) with tone marks in row 11, corresponding
to

0x0101 LATIN SMALL LETTER A WITH MACRON
0x00E1 LATIN SMALL LETTER A WITH ACUTE
0x01CE LATIN SMALL LETTER A WITH CARON
0x00E0 LATIN SMALL LETTER A WITH GRAVE
0x0113 LATIN SMALL LETTER E WITH MACRON
0x00E9 LATIN SMALL LETTER E WITH ACUTE
0x011B LATIN SMALL LETTER E WITH CARON
0x00E8 LATIN SMALL LETTER E WITH GRAVE
0x012B LATIN SMALL LETTER I WITH MACRON
0x00ED LATIN SMALL LETTER I WITH ACUTE
0x01D0 LATIN SMALL LETTER I WITH CARON
0x00EC LATIN SMALL LETTER I WITH GRAVE
0x014D LATIN SMALL LETTER O WITH MACRON
0x00F3 LATIN SMALL LETTER O WITH ACUTE
0x01D2 LATIN SMALL LETTER O WITH CARON
0x00F2 LATIN SMALL LETTER O WITH GRAVE
0x016B LATIN SMALL LETTER U WITH MACRON
0x00FA LATIN SMALL LETTER U WITH ACUTE
0x01D4 LATIN SMALL LETTER U WITH CARON
0x00F9 LATIN SMALL LETTER U WITH GRAVE
0x01D6 LATIN SMALL LETTER U WITH DIAERESIS AND MACRON
0x01D8 LATIN SMALL LETTER U WITH DIAERESIS AND ACUTE
0x01DA LATIN SMALL LETTER U WITH DIAERESIS AND CARON
0x01DC LATIN SMALL LETTER U WITH DIAERESIS AND GRAVE
0x00FC LATIN SMALL LETTER U WITH DIAERESIS

It is important to note that Unicode does not include full-width
versions of the precomposed characters: GB 2312 specified full-width
Roman characters. So converting a GB2312 encoded file containing, say,

    dinmn

would probably result in a Unicode version where 'd', 'i', 'n', 'm',
and 'n' are full-width, while '' and '' are half-width.

So when going to and front Unicode when writing tone marks you want to
be aware of these issues.

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Zenkaku Language Hacker                            http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT