From: Jeroen Ruigrok van der Werven (asmodai@in-nomine.org)
Date: Mon Dec 24 2007 - 00:55:57 CST
-On [20071224 03:16], George W Gerrity (g.gerrity@gwg-associates.com.au) wrote:
>The e7 in the correct encoding and the 7e in the incorrect one suggest to me a
>byte-order and/or bit-order problem. Are all the Little-Endian/Big-Endian flags
>correct in the entire system build? Are definitions such as Unsigned Integer
>and the sizes for all integers defined correctly in all source code for the
>entire build?
Well, if by entire system build you mean my FreeBSD system then yes. I've been
working for many months on various scripts using UTF-8 without any problems.
Including Japanese, Korean, Chinese (both simplified and traditional) and
Thai.
Python is only giving these encoding/decoding problems with curses
(_curses.so) even despite everything being linked against a ncursesw.so.
/usr/local/lib/python2.5/lib-dynload/_curses.so:
libncursesw.so.6 => /lib/libncursesw.so.6 (0x28179000)
As a hypothesis: if one component would not be using a wchar_t to represent
the codepoints, would it lead to the behaviour I am seeing? If so, at least I
know it would be that there is some component, some library, that is *not*
using a wide character type and thus causing these issues.
-- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ In every stone sleeps a crystal...
This archive was generated by hypermail 2.1.5 : Mon Dec 24 2007 - 00:59:56 CST