Re: Chinese in VB

From: Doug Ewell (dewell@adelphia.net)
Date: Mon Oct 14 2002 - 01:39:57 EDT

  • Next message: Michael Everson: "Re: Manchu/Mongolian in Unicode"

    <violet@time.net.my> wrote:

    > I am trying to write an application that can read input in Tradisional
    > Chinese but output (printout on papers)in Simplified Chinese, without
    > any 3rd party software (e.g. ChineseStar, TwinBridge).
    >
    > How can I implement Unicode in the coding? The programming language
    > I'm using is Ms Visual Basic 6 Professional Edition.

    It depends on how much of the problem you want to solve. Mapping
    between Traditional Chinese (TC) and Simplified Chinese (SC) is *not*
    generally 1-to-1, despite what many people believe. It could be
    1-to-many, many-to-1, or even many-to-many, depending on which
    character(s) are involved.

    Some TC characters have different SC "equivalents" depending on which
    meaning of the word is intended. And not every TC character ever
    invented has an SC equivalent. There is even at least one character A
    that is both the traditional form of some character B *and* the
    simplified form of another character C!

    TC/SC equivalence in the general case is a linguistic problem. The
    Unicode Standard is a character encoding standard, not a linguistic
    standard, so it does not attempt to provide definitive TC/SC mapping
    tables. The official Unicode Han database:

    http://www.unicode.org/Public/UNIDATA/Unihan.txt

    does include fields called "kSimplifiedVariant" and
    "kTraditionalVariant," which may be of some assistance. But as you will
    see, only 2629 "simplified variants" and 2554 "traditional variants" are
    listed, for tens of thousands of Han characters.

    A group of mainland Chinese and Taiwanese industry specialists have
    tried (unsuccessfully) to establish a TC/SC conversion layer within the
    forthcoming internationalized domain name (IDN) architecture. Their
    document includes a list of about 2000 1-to-1 TC/SC pairs taken from
    official Chinese and Taiwanese references. It explicitly does not
    propose a solution for the non-1-to-1 conversion cases, but dismisses
    these cases as uncommon. The document (draft-ietf-idn-tsconv-02.txt)
    has expired from the IETF timetable and is no longer available, but I
    can supply a copy if you are still interested.

    Of course, if you already have the TC/SC conversion module and just need
    to convert between a DBCS encoding (e.g. GB 2312) in order to "implement
    Unicode in the coding," the Unihan.txt file does include these mappings.

    -Doug Ewell
     Fullerton, California



    This archive was generated by hypermail 2.1.5 : Mon Oct 14 2002 - 02:18:59 EDT