From: Frank da Cruz (fdc@columbia.edu)
Date: Wed Jun 06 2007 - 13:26:30 CDT
Dan Saltel wrote:
> Hi,
> I want a program that will read in a unicode file, convert it to asci so
> that we can upload it into our database.
> Example:
> If the input file looks like:
> Último año de carrera
> It would convert it to
> ultimo ano de carrera
>
> Do I have to write a program to do this? Or are there utilities or
> procedures that will do this for me?
>
In this case, converting ISO 8859-1 to ASCII by "removing accents".
I'm sure there are many options, as well as much opinion against doing
it all... But one option, often overlooked, that does exactly what you
are asking is Kermit software:
http://www.columbia.edu/kermit/
which can convert character sets as part of the file transfer process
(upload or download), or also convert files locally without transferring
them. Example with file transfer:
On the sending side:
set file character-set latin1
set transfer character-set latin1
send name-of-file
On the receiving side:
set file character-set ascii
receive
Example of converting a local file:
translate name-of-file latin1 ascii name-of-result-file
Kermit can convert between any two character sets that it supports,
including UTF-8, UCS-2, the ISO 8859 alphabets, the ISO 646 7-bit "national
replacement" sets of yore, numerous PC (DOS) code pages and Windows code
pages, and other proprietary and standard character sets. Of course the
result doesn't always make sense; for example, translating Cyrillic into
ASCII, although even in this case there is a special case to do the
replacements "by sound".
The Kermit software that does this is available for Unix (all versions),
Windows, DOS, VMS, MVS, VM/CMS, and various other operating systems.
Frank da Cruz
Columbia University
This archive was generated by hypermail 2.1.5 : Wed Jun 06 2007 - 13:28:43 CDT