Clarification of character classes

From: Tobias Hunger (tobias@berlin-consortium.org)
Date: Thu Dec 14 2000 - 07:02:41 EST


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello!

I have some more questions on the charcater classes used in chapter 5 of the
book. Here is a list of classes mentioned and what I made them out to be. It
would be great if someone could verify that I got it right:-)

Table 5-3:
CR: U+000D
LF: U+000A
Format: Everything with General Category of C? besides CR/LF
Virama: Every character with Canonical Combining Class of 9
Joining: Every charcter with Canonical Combining Class != 0 (but not
   a Virama) (?)
L: U+1100-U+115F
V: U+1160-U+11A7
T: U+11A8-U+11FF
Lo: Everything besides to above with General Category of Lo (?)
Other: Every letter (besides those above) with General Category L? (?)

Table 5-4:
Sep: Paragraph Separator (U+2029) / Line Separator (U+2028)
TAB: Now which one is this? U+0009? U+000B?
Let: Everything with General Category L?
Com: Same as Joining above? Or Combining Property from PropList.txt?
Hira: U+3040-U+309F
Kata: U+30A0-U+30FF
Han: All the CJK-Ranges (?)

Table 5-5:
ZWSP U+200B
ZWNBSP U+FEFF
Sp: Every letter with General Category Zp (?)
Break: LS/PS. What else?
Com: same as above (?)
Ideographic: Same as Han above or everything with Ideographic Property in
   PropList.txt?
Alphabetic: Everything with Alphabetic Property in PropList.txt
Exclam: Now how do I figure this one out?Terminal Punctuation Property?
Syntax: What is a Solidus? Which characters belong here?
Open: General Category Ps
Close: General Category Pe
Quote: General Category Pi and Pf
NonStarter: Which Haragama and Katakana characters are small?
HyphenMinus: U+002D
Insep: Ellipsis Characters and leaders (?)
Number: General Category Nd
NumericPrefix: How do I figure this one out? With Bidi-Properties?
NumericPostfix: and this one?
NumericInfix: how about this one?
Base: Cannonical Combining Class == 0
NonBase: Cannonical Combining Class != 0 (?)
All: Everything

Table 5-6:
Sp: Same as in 5-5?
Term: Terminal Punctuation Property?
Dot: U+00B7 (?)
Cap: General Category Lu, Lt, Lo
Lower: General Category Ll
Open: same as in 5-5
Close: Is this the same as in 5-5? This one includes period, comma, ... which
   the on ein 5-5 does not.

Thank you for your help. Propaby I am just a bit confused and should be able
to figure this out on my own, but I just don't get it.

- --
Gruss,
Tobias

- -------------------------------------------------------------------
Tobias Hunger The box said: 'Windows 95 or better'
tobias@berlin-consortium.org So I installed Linux.
- -------------------------------------------------------------------

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6OLbiVND+cGpk748RAh7pAJ9gcTldlwjpW2tmenzL9MzfgUzBQgCfV7kL
Fpl4SRA6LJtIGjw959rAY6g=
=rCf+
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT