Dear Unicoders,
I have 4 questions about character names:
(1) how does one figure out the character names of the code points
(in ranges in the UnicodeData.txt file)? Is there a separate
file? Can you auto generate them and if so how?
For example: if I wanted to find the name of code point U+5728
where would the information be?
I'm auto generating data structures; Using UnicodeData.txt, as
input, gets me most of the way (I think). The gaps occur for
the ranges:
3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;;
9FA5;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;
AC00;<Hangul Syllable, First>;Lo;0;L;;;;;N;;;;;
D7A3;<Hangul Syllable, Last>;Lo;0;L;;;;;N;;;;;
D800;<Non Private Use High Surrogate, First>;Cs;0;L;;;;;N;;;;;
DB7F;<Non Private Use High Surrogate, Last>;Cs;0;L;;;;;N;;;;;
DC00;<Low Surrogate, First>;Cs;0;L;;;;;N;;;;;
DFFF;<Low Surrogate, Last>;Cs;0;L;;;;;N;;;;;
...and also for the private use ranges
(which we'll probably be needing).
(2) how do I locate the ISO/IEC character naming guidelines?
I looked in "The Unicode Standard Version 3.0" and it refers
me to Informative Annex K of ISO/IEC 10646. Is the information
available electronically? I looked at the ISO site and it said
that "there is no electronic access to the contents of ISO
standards" (http://www.iso.ch/infoe/faq.htm#Standards). It did
mention that this was in the pipeline, but didn't say when.
(3) when surrogates are introduced, will there be mappings from
surrogate pairs to character names? Will they be included
in later versions of UnicodeData.txt? It's not an issue at
the moment, but I'd like to structure my code such that I can
just slot in surrogate code later.
(4) why are they called "character names" and not "code point names"?
Regards,
Viranga
Email: viranga@mds.rmit.edu.au
Phone: +61 3 9925 4124 (Work)
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT