Re: U+2212 (Minus Sign) and Java's ISO-2022-JP conversion

From: Katsuhiko Momoi (momoi@alumni.indiana.edu)
Date: Sat Apr 02 2005 - 03:06:26 CST

  • Next message: Michael Everson: "Re: Sindhi characters proposed"

    Markus Scherer wrote:

    >Charsets are a mess.
    >
    >
    Agreed.

    >Japanese charsets are particulary notorious, see "XML Japanese
    >Profile" http://www.w3.org/TR/japanese-xml/
    >
    >
    Thanks for the info. We checked and it turns out that we mistakenly fed
    one of the lookalike characters, \uFF0D rather than \u2212 to setContent
    with the target encoding, ISO-2022-JP.
    So, please disregard my query.

    >ISO-2022-* are even worse than others because no one publishes
    >comprehensive documentation for how they convert for these.
    >
    >Evidently, in this case the Java 1.4 and 1.5 converters are different.
    >
    >
    As stated above. This was our error and not the fault of Java's converters.

    >On Apr 1, 2005 12:24 AM, Katsuhiko Momoi <momoi@alumni.indiana.edu> wrote:
    >
    >
    >>Using Java's native2ascii conversion utility -- I used the one that came
    >>with SDK 1.5 for Windows, \u2212 converts to ISO-2022-JP. ...
    >>... Java fails to convert \u2212 to ISO-2022-JP. (JDK version 1.4.x.)
    >>
    >>
    >
    >
    >
    >>Has anyone experienced this problem? I would appreciate a workaround or
    >>a solution.
    >>
    >>
    >
    >Use UTF-8. Seriously.
    >
    >
    Indeed. If only we could change national mail encoding (de facto)
    standards overnight!

    - Kat

    -- 
    Katsuhiko Momoi
    


    This archive was generated by hypermail 2.1.5 : Sat Apr 02 2005 - 03:07:29 CST