From: Paul Johnston (paj@pajhome.org.uk)
Date: Fri Oct 06 2006 - 10:48:58 CST
Hi,
Thanks for all the helpful responses. To clarify things, 1251 was my
error, I meant Windows-1252. The troublesome character was 2019 - right
single quotation mark.
Using WideCharToMultiByte does exactly what I need. For those
interested, here is the Python code I'm using:
from ctypes import *
def de_unicode(instr):
outstr = create_string_buffer(len(instr) + 1)
windll.kernel32.WideCharToMultiByte(1252, 0,
c_char_p(instr.encode('utf-16le')),
len(instr), outstr, len(instr) + 1,
None, None)
return outstr.value
The suggestion to use HTML entities, e.g. ’ was a good idea.
Unfortunately, htmldoc doesn't support unicode at all - such characters
just do not appear in the output.
In my application, I am generating the PDF files in a CGI script.
Htmldoc is handy as it can do the conversion from the command line. Most
PDF generators are printer drivers, and I haven't (so far) managed to
make one work from a CGI script. It's something I may investigate
further in the future as htmldoc has other problems, e.g. not supporting
CSS.
I am aware of solutions like iText that let you generate PDFs without
using HTML at all, but I've got a feeling that would be hard work. I
already have well established systems for building HTML documents.
Thanks again for all the useful suggestions,
Paul
This archive was generated by hypermail 2.1.5 : Fri Oct 06 2006 - 11:02:10 CST