From: Addison Phillips (addison@yahoo-inc.com)
Date: Wed Oct 04 2006 - 12:24:31 CST
I think you mean "windows-1252", the Western European code page. Code
page 1251 is the Cyrillic code page.
Windows-1252, like many Microsoft code pages, differs from the related
"standard" encoding. In this case, it is a superset of ISO 8859-1 (often
referred to as Latin-1). The difference is that Microsoft added 27
characters in the C1 control range (0x80->0x9F), including the Euro
symbol and a variety of "typesetter's quotes". These often cause
problems for software expecting pure ISO 8859-1.
HTMLDOC has both command-line and GUI options that allow you to select
the appropriate windows encoding (sorry, not UTF-8) to use when reading
the source files. You should also include a correct <meta> tag declaring
the encoding to be "windows-1252" and *not* "iso-8859-1" in your HTML
documents.
If that doesn't work, you can also use HTML entities in your pages to
replace the characters. For example, ’ is a right single quote. Or
you can use a transliterating converter (such as the //TRANSLIT option
on libiconv) to approximate the right results. (Caution: you may
experience data degradation with this last "solution")
Hope that helps.
Addison
-- Addison Phillips Globalization Architect -- Yahoo! Inc. Internationalization is an architecture. It is not a feature. Paul Johnston wrote: > Hi, > > I am using Unicode throughout my system (a web-based database for > tracking work). I am forced to use a tool (htmldoc - for html to PDF > conversion) that does not support unicode in any manner. This should not > be a significant problem in practice, as all the data is in English. > However, I am having problems with a few characters, primarily an > apostrophe-like character (don't know the code offhand; it's not in > Latin-1). > > If I encode the output as Windows-1251, the character causes an error. > If I used utf-8 it causes visual garbage in the output. What would be > ideal is to perform a "visually approximate" conversion to Windows-1251, > which would replace this with a regular apostrophe. I am happy to accept > the risks that such an approximation carries. > > I know Windows can do this, as retrieving values from controls using a > non-Unicode interface does exactly this conversion. However, I have not > been able to find out how I can perform the conversion at will. I > apologise if this is not the most appropriate forum for this question, > but I have been looking long ang hard for this without success. > > Many thanks for any help you can offer, > > Paul > > P.S. If someone can suggest a unicode compatible replacement for > htmldoc, that would satisfy me too! > > >
This archive was generated by hypermail 2.1.5 : Wed Oct 04 2006 - 12:26:23 CST