Re: FW: A product compatibility question

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Oct 17 2001 - 18:06:22 EDT


Sampo Syreeni responded:

> On Wed, 17 Oct 2001, Kenneth Whistler wrote:
>
> >"Traditional Chinese" and "Simplified Chinese" are *not* two different
> >languages.
>
> But they are naturally handled as such, no?

No.

> After all, they employ the
> same Unicode codepoints but are displayed in a different font altogether.

Actually, they employ overlapping sets of Unicode code points. Some
of the code points are obviously shared, e.g., U+4E00 'one', U+4E8C 'two',
U+4E09 'three', and so on -- there are many thousands of such code
points used by both.

What they don't share is legacy code pages. TC is associated with
Windows CP950, while SC is associated with Windows CP936, and so on
for other systems. If you convert all the legacy code points to
Unicode, you get a largely overlapping, but still disjunct set of
code points, because the repertoires of CP950 and CP936 are not the
same.

Furthermore, while the traditional TC and SC fonts were distinct,
since they contained just the glyphs tailored to the legacy code
pages, now with Unihan fonts widely available, you can display either
TC-derived data or SC-derived data with the same font just fine.
And on many points, the Chinese-specific characteristics of the
font designs for TC- or SC-derived data share typographic characteristics
in *contrast* to a traditional Japanese font, for example.

>
> >The TC/SC distinction is an artifact of legacy choices made for encoding
> >characters and implementation of text in East Asian computer systems. It
> >is *not* a language distinction, and should not be tagged as such.
>
> But there are distinguishable dialectal differences between the variants
> of the base Chinese language used between the areas which primarily
> utilize Simplified and Traditional Chinese.

True but misleading.

Mandarin is the predominant spoken form of Chinese on Taiwan, which
uses a TC code page and TC font. It is also true that Minnan ("Taiwanese")
and Hakka, quite different spoken forms of Chinese, are also spoken
on Taiwan.

Mandarin is the predominant spoken form of Chinese in northern
China, which uses an SC code page and SC font. And while you
don't often here true Beijing dialect in Taipei, or the Taiwanese-
influenced Taipei dialect in Beijing, the educated spoken Mandarin
spoken in both capitals is not all that different.

It is also true that considered linguistically, spoken *Chinese*
is not one language, but anywhere from 8 to a dozen languages
(depending on how down and dirty you want to get about language
distinctions in the very diverse SE corner of China).

But *written* Chinese is essentially a standardized single
written form, much like standard written English in its
international usage. There are, of course, instances of dialect
Mandarin novels in the north and dialect Cantonese novels in
the south, and so on, which *are* representations of different
languages. But that is not at all what most technical people
are concerned about when they have to support traditional
versus simplified Chinese.

> Hence, even if they are not
> treated as separate languages, one cannot do a codepoint-for-codepoint
> transformation and end up with legible text.

Incorrect. Again, they are *not* separate languages, but two
orthographic renditions of the same *written* language.

That doesn't mean that there are not legibility problems between
the two orthographies. A person used to traditional characters
finds the simplified forms hard to understand or in cases
incomprehensible. A person used to simplified characters finds
the traditional forms baroque or in cases unrecognizable. But
that is not a *language* difference -- it is an *orthography*
difference. It is a blown-up, systematized version of the
kinds of differences we see in spelling between American and
British written English. (Although I don't want to give the
impression that the distinction actually *is* a spelling difference.)

> This sort of distinction
> *should* be tagged as a dialect variant, if I'm not incorrect altogether.

No it should *not* be tagged as a dialect variant.

--Ken

>
> Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111
> student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
> openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
>
>
>



This archive was generated by hypermail 2.1.2 : Wed Oct 17 2001 - 19:00:08 EDT