Re: japanese xml

From: 'Viranga Ratnaike' (viranga@mds.rmit.edu.au)
Date: Wed Aug 29 2001 - 21:21:41 EDT


Hi All,

   thankyou to all who replied.

   XML is making more sense to me now : )

   I have a few more questions:

        Is it ok for Unicode code points to be encoded/serialized using EUC?
        I'm not planning on doing this; just wondering what (?if any?)
        restrictions, there are on choice of transformation format.

        Is the conversion from euc-jp to utf-8/utf-16 simple; are there
        algorithms and/or converters, out there, that I can access?

        [Possibly OT] A colleague mentioned that it might be good to
        investigate DoCoMo/WAP2.0 (XHTML) documents. Does anyone know
        a nice site with Japanese Unicode documents for WAP? Or a how-to
        guide on creating my own? http://www.nttdocomo.com/ seems to be down.
        http://www.wap.com/share/osas/cache/artid500438.html and
        http://www.nttdocomo.co.jp/ seem ok, but I thought I'd ask.

        Is there much interest, for Unicode, in Japan? Most documents,
        I find, use JIS.

Regards,

        Viranga

On Wed, Aug 29, 2001 at 11:21:43AM +0200, Marco Cimarosti wrote:
> Viranga Ratnaike wrote:
> >I was hunting for examples of japanese xml and came across the
> >following, which looks rather cool. Except that it doesn't seem
> >to actually be unicode. I thought XML had mandated unicode?
> > http://java.sun.com/xml/jaxp-1.1/examples/samples/weekly-euc-jp.xml
>
> Not at all! Any encoding can be used in XML documents. The requirement is
> that the encoding is declared inside each document.
>
> In fact, that document begins with:
>
> <?xml version="1.0" encoding="euc-jp" ?>
>
> "euc-jp" means the Japanese character set (JIS) serialized in EUC ("Extended
> Unix Code"). EUC is what Unicoders would call a "transformation format", and
> it is very popular with the three main CJK character sets (JIS=Japan,
> GB=China, KCS=Korea).
>
> _ Marco
>



This archive was generated by hypermail 2.1.2 : Wed Aug 29 2001 - 22:37:13 EDT