accessing extended ranges

From: Ben Monroe (bendono@attbi.com)
Date: Tue Mar 26 2002 - 03:16:39 EST


I would like to access some of the characters from "CJK Unified Ideographs
Extension B." These are all in the range of 20000-2A6DF. (direct link:
http://www.unicode.org/charts/PDF/U20000.pdf )

"Basic Latin" appears in 0000-007F range. The original "CJK Unified
Ideographs" all appear within the 4E00–9FAF range. These are all easy to
access with U+xxxx (4 x's). In Java, the format /uxxxx works just fine (and
also the same for http://www.macchiato.com/unicode/ ). However, how do you
access the characters in the larger ranges (ie, U+xxxxx or /uxxxxx)?

Directly using the 5 value format /uxxxxx produces are Unicode character
followed by the 5th x. Here is a quick example:

public class UniStringTest {
  static public void main(String[] args) {
    String s1 = "\u963F"; // displays fine; standard /uxxxx (4x's)
    System.out.println(s1);
    String s2 = "\u9FA0"; // also displays fine; standard /uxxxx (4x's)
    System.out.println(s2);
    String s3 = "\u2A6A5"; // biggest character that I know (5x's) but
doesn't process
    System.out.println(s3);
    }
}

I understand this isn't a programming ML, but I just used the Java program
as an example.
I'd appreciate some input.
Thanks,

Ben Monroe



This archive was generated by hypermail 2.1.2 : Tue Mar 26 2002 - 04:28:19 EST