Re: japanese xml

From: 'Viranga Ratnaike' (viranga@mds.rmit.edu.au)
Date: Mon Sep 03 2001 - 20:52:26 EDT


On Fri, Aug 31, 2001 at 06:16:23PM +0200, Marco Cimarosti wrote:
> I should not try to interpret XML specs. 'Anyway, my understanding is that
> the XML legislators are simply saying that they adopt Unicode definition of
> "character", and the Unicode *set* (repertoire) of characters. They are not
> that they mandate one of Unicode forms as the only encoding for a XML source
> file.

Hi Marco,

        this is just a followup to the thread; I thought people might be
        interested.
        

    This email refers to a description in

        http://www.w3.org/TR/japanese-xml/#AEN35655520

    The status section of the document states the disclaimer

        "This document is a NOTE made available by W3C for discussion only.
         Publication of this Note by W3C indicates no endorsement by W3C,
         the W3C Team, or any W3C Members."

    The Japanese Standards Association wants public discussion and hopes to
    create a JIS sometime in the future. The current relevant JIS is
    [JIS TR X 0015] Japanese Industrial Standards Committee. XML Japanese
        Profile, JIS TR 0015:1999, Japanese Standards Association, May 1999.

    This TR (technical report) uses the following acronyms:

                CCS - Coded Character Set
                CES - Character Encoding Schemes
                The 'B' just means "Appendix B"

    So onto the relevant passage in the TR:

       "B Needs for Japanese XML profile (Non-Normative)

        [XML] adopts [ISO/IEC10646] or [Unicode 3.0] as the CCS, which
        contains all Japanese characters. UTF-8 and UTF-16 are the
        recommended CESs, and implementations are required to support
        them. Other existing CESs are optionally allowed, as long as they
        represent characters in [Unicode 3.0] only.

        [XML], however, provides little information on existing CESs already
        in use for the interchange of Japanese characters. Such CESs are
        allowed as mere options among many others. Furthermore, [XML] says
        nothing about the appropriate CESs for each protocol (e.g. SMTP or
        HTTP) and those for information exchange files.

        The mapping between such existing CESs and [ISO/IEC10646]/[Unicode
        3.0] is not specified either. Some mutually different conversions are
        in use, and thus different XML processors may emit different outputs.

        This technical report addresses existing CESs and clarifies open
        issues. Although problems with the use of such CESs are not solved,
        the nature of these problems has become clear."

Best Wishes,

        Viranga

P.S. I'm trying to find ways to spend more time on the Japanese project
        without our senior programmer finding out : ) This is turning
        out to be quite interesting.



This archive was generated by hypermail 2.1.2 : Mon Sep 03 2001 - 21:53:37 EDT