Re: Encoding issue, clues needed

From: Jeroen Ruigrok van der Werven (asmodai@in-nomine.org)
Date: Mon Dec 24 2007 - 00:55:57 CST

  • Next message: Behnam: "Re: Directionality Standard"

    -On [20071224 03:16], George W Gerrity (g.gerrity@gwg-associates.com.au) wrote:
    >The e7 in the correct encoding and the 7e in the incorrect one suggest to me a
    >byte-order and/or bit-order problem. Are all the Little-Endian/Big-Endian flags
    >correct in the entire system build? Are definitions such as Unsigned Integer
    >and the sizes for all integers defined correctly in all source code for the
    >entire build?

    Well, if by entire system build you mean my FreeBSD system then yes. I've been
    working for many months on various scripts using UTF-8 without any problems.
    Including Japanese, Korean, Chinese (both simplified and traditional) and
    Thai.

    Python is only giving these encoding/decoding problems with curses
    (_curses.so) even despite everything being linked against a ncursesw.so.

    /usr/local/lib/python2.5/lib-dynload/_curses.so:
            libncursesw.so.6 => /lib/libncursesw.so.6 (0x28179000)

    As a hypothesis: if one component would not be using a wchar_t to represent
    the codepoints, would it lead to the behaviour I am seeing? If so, at least I
    know it would be that there is some component, some library, that is *not*
    using a wide character type and thus causing these issues.

    -- 
    Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
    イェルーン ラウフロック ヴァン デル ウェルヴェン
    http://www.in-nomine.org/ | http://www.rangaku.org/
    In every stone sleeps a crystal...
    


    This archive was generated by hypermail 2.1.5 : Mon Dec 24 2007 - 00:59:56 CST