"Frank Sledge" wrote on 1999-04-06 02:14 UTC:
> My partner is in the same boat.  He is planning to write a program
> in Chipmunk Basic that will convert an 8-bit document into UTF-8. 
> I imagine the same could be done in Perl, C, Pascal or whatever,
> fairly simple task (at least in theory): suck in one byte from the 
> input file, look up the UTF-8 equivalent and send it to the 
> output file; repeat until end of input file.
Sounds like an excitingly challenging software-engineering project.
Estimated completion time including testing and documentation:
25 minutes.
Even faster (estimated installation time: 12.5 minutes):
GNU recode does provide UTF-8 <-> anything conversion
  http://www.iro.umontreal.ca/contrib/recode/recode-3.4q.tar.gz
Another alternative is this Perl program that replaces HTML/SGML
numerical character references by the corresponding UTF-8 sequences and
is excellently suited to quickly enter UTF-8 test documents:
------------------------------------------------------------------
#!/usr/bin/perl
# Convert HTML numeric character identifiers to UTF-8.  M. Kuhn, 1998
sub utf8 ($) {
    my $c = shift(@_);
    if ($c < 0x80) {
        return sprintf("%c", $c);
    } elsif ($c < 0x800) {
        return sprintf("%c%c", 0xc0 | ($c >> 6), 0x80 | ($c & 0x3f));
    } elsif ($c < 0x10000) {
        return sprintf("%c%c%c",
                       0xe0 | ($c >> 12),
                       0x80 | (($c >> 6) & 0x3f),
                       0x80 | ($c & 0x3f));
    } else {
        return utf8(0xfffd);
    }
}
while (<>) {
    while (/&\#[xX]([0-9a-fA-F]+);/) {
        $c = hex($1);
        $utf = utf8($c);
        s/$&/$utf/;
    }
    while (/&\#([0-9]+);/) {
        $utf = utf8($1);
        s/$&/$utf/;
    }
    print;
};
------------------------------------------------------------------
You can get a Perl interpreter from
  http://www.perl.com/pace/pub/perldocs/latest.html
and there is even a Mac version on
  http://www.iis.ee.ethz.ch/~neeri/macintosh/perl.html
Markus
-- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT