From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Aug 19 2005 - 12:13:33 CDT
From: "Dean Harding" <dean.harding@dload.com.au>
To: <unicode@arabink.com>; "'Magda Danish (Unicode)'"
<v-magdad@microsoft.com>
Cc: <unicode@unicode.org>
Sent: Friday, August 19, 2005 1:25 AM
Subject: RE: FW: Subj: Converting from UCS-2 to UTF-8
> Gregg Reynolds wrote:
>> On windows, the easiest thing to do is install cygwin, which comes with
>> a command line iconv implementation. http://www.cygwin.com/
>
> Wouldn't it be easier to use WideCharToMultiByte and pass in CP_UTF8 as
> the
> code page identifier? No need to download 3rd party libraries then.
For those users that still run Windows 95/98/ME, this won't work, as these
systems can only do the following:
- WideCharToMultiByte(): can only convert from UTF-16 to the local ANSI or
OEM 8-bit charset. No support to convert even to UTF-8! When converting to
ANSI or OEM, unsupported characters are silently replaced by '?'.
- MultiByteToWideChar(): can only convert from the local ANSI or OEM 8-bit
charset or from UTF-8, to UTF-16. This allows for example Notepad to load
and display an UTF-8 file, and even working on it, but it CANNOT save it
correctly (saving will silently replace all characters to the ANSI charset,
and replace missing characters by '?', so the saved file will not be UTF-8
encoded...)
For other NT/2000/XP/2003 systems, the conversions offered by the two
routines require that various charsets or codepages be installed in
Windows\System32 (these are the cp*.nls files). The list of supported
codepages seems hardcoded within the system and not extensible, and they can
only be installed using the Regional Settings control panel (it's not enough
to just copy the *.nls files).
I don't know if it's even possible to add more codepages than those
supported on each version of Windows (and I didn't find any place in the
registry where those codepages are effectively registered, as the existing
entries just seem to be there to allow compatiblity with other versions of
Windows by mapping the effective filenames used for the codepage mappings).
The restrictions above seem to exist for security reason (maps should not be
replacable, as it would affect the compatibility between Unicode and ANSI
Win32 APIs), and Microsoft does not provide any info about how to develop
and install new codepages...
So there are still applications needing converters based on other routines
and mappings.
This archive was generated by hypermail 2.1.5 : Fri Aug 19 2005 - 12:15:10 CDT