Character properties

From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Thu Sep 21 2000 - 08:00:31 EDT


I am trying to improve character properties handling in the language
Haskell. What should the following functions return, i.e. what is
most standard/natural/preferred mapping between Unicode character
categories and predicates like isalpha etc.? What else should be
provided? Here are definitions that I use currently:

isControl = c < ' ' || c >= '\x7F' && c <= '\x9F'
isPrint = category is other than [Zl,Zp,Cc,Cf,Cs,Co,Cn]
isSpace = one of "\t\n\r\f\v" || category is one of [Zs,Zl,Zp]
isGraph = isPrint c && not (isSpace c)
isPunct = isGraph c && not (isAlphaNum c)
isAlphaNum = category is one of [Lu,Ll,Lt,Nd,Nl,No,Lm,Lo]
isHexDigit = isDigit c || c >= 'A' && c <= 'F' || c >= 'a' && c <= 'f'
isDigit = c >= '0' && c <= '9'
isOctDigit = c >= '0' && c <= '7'
isAlpha = category is one of [Lu,Ll,Lt,Lm,Lo]
isUpper = category is one of [Lu,Lt]
isLower = category is Ll
isLatin1 = c <= '\xFF'
isAscii = c < '\x80'

isDigit intentionally recognizes ASCII digits only. IMHO it's more
often needed and this is what the Haskell 98 Report says. (But I
don't follow the report in some other cases.)

Titlecase could be handled too. Even then I think that isUpper should
be True for titlecase letters (so it's usable for testing if the first
letter of a word is uppercase), and there should be a separate function
for category Lu only (for testing if all characters are uppercase).

-- 
 __("<  Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTĘPCZA
QRCZAK



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT