RE: japanese xml

From: Addison Phillips [wM] (aphillips@webmethods.com)
Date: Thu Aug 30 2001 - 12:51:24 EDT


Hi Mike,

Perhaps I can rephrase Misha's answer ;-):

1. EUC-JP is an encoding ("charset") that was originally created to encoding
Japanese character sets such as JIS X 208 and JIS X 212.
2. As such, EUC-JP can be used to encode the subset of Unicode that contains
all of the characters in JIS X 208 and JIS X 212, etc.
3. An XML parser uses the Unicode character set internally to represent and
process character data. As such, the most natural encoding to use for an XML
file would be a Unicode encoding such as UTF-8 or UTF-16.
4. However, you can use any other encoding, provided you tag the file
appropriately (so that the parser knows what the encoding is and can
translate it to its internal representation).
5 You are not required to use EUC-JP for your Japanese XML files: you can
use the Unicode encodings directly. In some cases, though, your file
editting software may make it easier to work with EUC-JP (or
Shift-JIS/Microsoft Code Page 932).

As for an XML parser that handles all of these, I know from extensive
testing that ours does<g>. And it is worth mentioning, becuase, in fact,
EUC-JP (and many other encodings) are perfectly interoperable----for the
subset of characters that they represent. Most XML interchanges (for
example, marketplaces such as CommerceOne or Ariba) tend to prefer that
"legacy encoded" files be converted to UTF-8 for interoperability, but there
is no requirement that one do so and many backend XML systems, *especially*
in Japan, use the non-Unicode encodings.

Best Regards,

Addison

Addison P. Phillips
Globalization Architect / Manager, Globalization Engineering
webMethods, Inc. 432 Lakeside Drive, Sunnyvale, CA
+1 408.962.5487 (phone) +1 408.210.3659 (mobile)
-------------------------------------------------
Internationalization is an architecture. It is not a feature.

-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
Behalf Of Misha.Wolf@reuters.com
Sent: Thursday, August 30, 2001 8:37 AM
To: Ayers, Mike
Cc: unicode@unicode.org
Subject: RE: japanese xml

I have no idea of what you're talking about.

Misha

On 30/08/2001 16:11:14 "Ayers, Mike" wrote:
> > From: Misha.Wolf@reuters.com [mailto:Misha.Wolf@reuters.com]
> > Sent: Thursday, August 30, 2001 06:06 AM
>
> > IMO, I correctly replied to Viranga's question and I've
> > no idea what you're talking about below.
>
> Let me try to put it another way. What you said may have been
> technically correct, but it was probably not worth mentioning because it
> represents a noninteroperable encoding. Perhaps I am mistaken though - do
> you know of an XML parser that can parse the encoding that you suggested?
>
>
> /|/|ike

-----------------------------------------------------------------
        Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of the individual
sender, except where the sender specifically states them to be
the views of Reuters Ltd.



This archive was generated by hypermail 2.1.2 : Thu Aug 30 2001 - 13:55:56 EDT