Re: Speaking of Plane 1 characters...

From: Jungshik Shin (jshin@mailaps.org)
Date: Mon Nov 11 2002 - 18:37:02 EST

  • Next message: John H. Jenkins: "Info: Apple OSX Font Tools Suite 1.0.0 Released"

    On Mon, 11 Nov 2002, John Cowan wrote:

    > On *ix systems, use the "bc" command; type "obase=16" and "ibase=16".

      Thank you for this. I should have read the man page of bc more
    carefully. (or I used to know it but forgot...)

    > For this program, you must use capital letters for the hex digits.
    > To get the high surrogate, type "(xxxxx-10000)/400+DC00" for the high

      s/DC00/D800/

    > surrogate ("xxxxx" is the scalar value); to get the low surrogate,
    > type "(xxxxx-10000)%400+DC00".

    And one can define a function....

    > On the Macintosh, I have no clue.

      As you know so well, MacOS X is a Unix and 'bc' should be available
    there, too. If not by default, one can certainly grab the source and
    compile it or get a precompiled binary somewhere.

      It seems to me a waste of the bandwidth (however abundant it may have
    become recently. I heard several times on this list that it's not in a
    certain country in Europe ;-) ) to go all the way across the Atlantic or
    the continent to convert between UCVs and surrogate pairs. There are
    several ways to do it locally including two suggested above. On *nix
    including MacOS X (http://developer.apple.com/internet/macosx/perl.html),
    one can open up a small terminal window (yes, Mac OS X has a
    terminal window !) and run a script like the following(assuming Perl
    is installed. If GUI is desired, make one up in Perl/Tk, Tcl/Tk,
    pdksh, Python+Tk?...) This should also work in a command prompt of
    Windows. Alternatively, I guess a local html file with ECMAscript should
    also work.

    ------------Cut--------here----------------
    #!/usr/bin/perl -w
    # use the full path of your perl binary in place of /usr/bin/perl

    while ( 1 ) {
      print "** Enter Unicode code point in hexadecimal \n" .
            " (to end, press [enter]) : ";
      $| = 1; # force a flush after our print
      $ucs = <STDIN>;
      chomp $ucs;

      last if $ucs eq "";

      if ( $ucs =~ /[^a-f0-9A-F]/ ) {
        printf " Error: %s is invalid. Try again\n", $ucs;
        next;
      }

      $usv = hex $ucs;
      if ( 0xffff < $usv && $usv < 0x110000 ) {
        printf "UTF-16: %04x %04x\n", ($usv-0x10000) / 0x400 + 0xd800,
                                      ($usv-0x10000) % 0x400 + 0xdc00,
      }
      elsif ( $usv < 0xd800 || 0xdfff < $usv && $usv < 0x10000 ) {
        printf "UTF-16: %04x\n", $usv;
      }
      else {
        printf "Your input %s is not valid. Try again\n", $ucs;
      }
    }

    print "Bye !!\n";
    --------------------Cut---------here--------------

      Jungshik



    This archive was generated by hypermail 2.1.5 : Mon Nov 11 2002 - 19:13:52 EST