RE: UCS-2 to UTF-8 hex values

From: Hietaniemi Jarkko (NRC/Boston) (jarkko.hietaniemi@nokia.com)
Date: Wed Sep 19 2001 - 14:13:22 EDT


> CP UCS
> =============
> 2E 002E
> 2F 002F
> 30 0030
> ...
>
> Has anyone written or found a script which takes 4 digit
> hex representation of UCS-2 as (or similar to) the above
> which outputs the UTF-8 value equivalent in the
> same hex format?

Assuming I parsed your question correctly... any Perl newer than 5.6.0
(5.6.1 recommended) (check what perl -v shows):

$ perl -le 'print join(" ", map { sprintf "%02x", $_ } unpack("C*",
pack("U*", 0x80)))'
c2 80

Unraveling the incantation:
        pack U: pack as Unicode (Perl's internal representation is
UTF-8)
        unpack c: unpack as bytes
        sprintf: format as hex
        map: for all the results of the unpack C
        join: space separated

And your for input file, assuming two hex numbers separated by
whitespace
on their own line:

$ perl -nle 'if (/^[0-9a-f]+\s+([0-9a-f]+)$/i) { print "$_\t", join(" ",
map { sprintf "%02x", $_ } unpack("C*", pack("U*", hex($1))))'
input.file

>



This archive was generated by hypermail 2.1.2 : Wed Sep 19 2001 - 12:53:49 EDT