Re: The result of the plane 14 tag characters review.

From: Jungshik Shin (
Date: Sat Nov 30 2002 - 06:35:31 EST

  • Next message: Andy White: "FW: [Freebangfont-devel] Proposal to add Bengali Khanda Ta"

    On Sun, 17 Nov 2002, Doug Ewell wrote:

    > John Jenkins was referring to the preference of Japanese speakers for
    > reading Chinese-language text in Japanese-style glyphs. Perhaps an
    > appropriate language tag for this scenario might be "zh-JP", meaning
    > "Chinese as used in Japan." Even then, the language-country model is
    > not perfect; the Japanese speaker in question could be located anywhere
    > in the world, even in China.

      I think 'country/region' in 'language-country' model should be
    interpreted as the country/region which a person want to be 'affiliated
    with' whereever (s)he may live. On the other hand, there are cases like
    'default paper size' and 'measurement units' which are not strongly
    correlated with the locale but which nonetheless can be inferred from
    where a person lives (well, en-US is about the only place where US Letter
    is the default and emperial units are still 'standard'). When
    'can be inferred from' approach went to the extreme, we have an
    absurd case.

      A few years ago, JDK 1.0(or 1.1) mapped locales to timezones.
    When ja-JP locale was selected, the timezone was always set to UTC
    +0900. For en-US, it's set to UTC -0800 (or UTC -0700). Obviously,
    this couldn't be right because Japanese can live anywhere in the world
    and US has several timezones other than US PST/PDT.

    > > How do Chinese feel about this? They might find it objectionable to
    > > have to read Chinese in Japanese glyphs in a multilingual document.
    > You never hear this situation mentioned. I take that to mean that
    > Chinese speakers do not find it cripplingly objectionable the way some
    > Japanese speakers find the opposite situation.

      I heard some Chinese complain although not to the extent
    that some Japanese do.

    > What about the other applications for language tagging mentioned in RFC
    > 3066 and in my Plane 14 paper, like spelling and grammar checking and
    > speech synthesis? Should these be available only for fancy text?

      I just hit upon another use of Plane 14 lang. tag although it's just
    as a convenience measure some multilingual applications may want to take.
    (IETF) RFC 2231(section 5) extended RFC 2047 to allow the specification
    of language in RFC-2047 encoded mail header with an optional 'language
    tag' following MIME charset and '*' as shown below.

      =?ISO-8859-15*FR?Q?......?= =?ISO-8859-15*DE?Q?.........?=
      =?UTF-8*ZH-CN?B?........?= =?UTF-8*JA?B?............?=
      =?EUC-KR?B?......?= =?ISO-2022-JP?Q?......?=

    Some implementations might find it handy to 'decode' a sequence of
    RFC 2047/2231 encoded words with lang spec.(explicitly or implied)
    to a Unicode string with Plane 14 lang. tags embedded. Of course,
    this 'internal' use of Plane 14 lang. tags as a 'convenient' means of
    preserving otherwise lost(in conversion to Unicode) information cannot be
    used as an argument for/against their deprecation because even deprecated,
    they still can be used this way *internally*.

      Jungshik Shin

    This archive was generated by hypermail 2.1.5 : Sat Nov 30 2002 - 07:17:32 EST