From: Jungshik Shin (jshin@mailaps.org)
Date: Mon Nov 11 2002 - 18:37:02 EST
On Mon, 11 Nov 2002, John Cowan wrote:
> On *ix systems, use the "bc" command; type "obase=16" and "ibase=16".
Thank you for this. I should have read the man page of bc more
carefully. (or I used to know it but forgot...)
> For this program, you must use capital letters for the hex digits.
> To get the high surrogate, type "(xxxxx-10000)/400+DC00" for the high
s/DC00/D800/
> surrogate ("xxxxx" is the scalar value); to get the low surrogate,
> type "(xxxxx-10000)%400+DC00".
And one can define a function....
> On the Macintosh, I have no clue.
As you know so well, MacOS X is a Unix and 'bc' should be available
there, too. If not by default, one can certainly grab the source and
compile it or get a precompiled binary somewhere.
It seems to me a waste of the bandwidth (however abundant it may have
become recently. I heard several times on this list that it's not in a
certain country in Europe ;-) ) to go all the way across the Atlantic or
the continent to convert between UCVs and surrogate pairs. There are
several ways to do it locally including two suggested above. On *nix
including MacOS X (http://developer.apple.com/internet/macosx/perl.html),
one can open up a small terminal window (yes, Mac OS X has a
terminal window !) and run a script like the following(assuming Perl
is installed. If GUI is desired, make one up in Perl/Tk, Tcl/Tk,
pdksh, Python+Tk?...) This should also work in a command prompt of
Windows. Alternatively, I guess a local html file with ECMAscript should
also work.
------------Cut--------here----------------
#!/usr/bin/perl -w
# use the full path of your perl binary in place of /usr/bin/perl
while ( 1 ) {
print "** Enter Unicode code point in hexadecimal \n" .
" (to end, press [enter]) : ";
$| = 1; # force a flush after our print
$ucs = <STDIN>;
chomp $ucs;
last if $ucs eq "";
if ( $ucs =~ /[^a-f0-9A-F]/ ) {
printf " Error: %s is invalid. Try again\n", $ucs;
next;
}
$usv = hex $ucs;
if ( 0xffff < $usv && $usv < 0x110000 ) {
printf "UTF-16: %04x %04x\n", ($usv-0x10000) / 0x400 + 0xd800,
($usv-0x10000) % 0x400 + 0xdc00,
}
elsif ( $usv < 0xd800 || 0xdfff < $usv && $usv < 0x10000 ) {
printf "UTF-16: %04x\n", $usv;
}
else {
printf "Your input %s is not valid. Try again\n", $ucs;
}
}
print "Bye !!\n";
--------------------Cut---------here--------------
Jungshik
This archive was generated by hypermail 2.1.5 : Mon Nov 11 2002 - 19:13:52 EST