Character properties

From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Thu Sep 21 2000 - 08:00:31 EDT

Next message: Otto Stolz: "Re: Java, unicode and fonts."
Previous message: Marco.Cimarosti@icl.com: "RE: [very OT] Welsch (was: [very OT] "Slavic")"
Next in thread: Roozbeh Pournader: "Re: Character properties"
Maybe reply: Roozbeh Pournader: "Re: Character properties"
Maybe reply: Jonathan Rosenne: "RE: Character properties"
Maybe reply: Roozbeh Pournader: "Re: Character properties"
Maybe reply: Marcin 'Qrczak' Kowalczyk: "Re: Character properties"
Maybe reply: Marco.Cimarosti@icl.com: "RE: Character properties"
Maybe reply: Roozbeh Pournader: "RE: Character properties"
Maybe reply: Kenneth Whistler: "Re: Character properties"
Maybe reply: Mark Davis: "Re: Character properties"
Maybe reply: Marcin 'Qrczak' Kowalczyk: "Re: Character properties"
Maybe reply: Marco.Cimarosti@icl.com: "RE: Character properties"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I am trying to improve character properties handling in the language
Haskell. What should the following functions return, i.e. what is
most standard/natural/preferred mapping between Unicode character
categories and predicates like isalpha etc.? What else should be
provided? Here are definitions that I use currently:

isControl = c < ' ' || c >= '\x7F' && c <= '\x9F'
isPrint = category is other than [Zl,Zp,Cc,Cf,Cs,Co,Cn]
isSpace = one of "\t\n\r\f\v" || category is one of [Zs,Zl,Zp]
isGraph = isPrint c && not (isSpace c)
isPunct = isGraph c && not (isAlphaNum c)
isAlphaNum = category is one of [Lu,Ll,Lt,Nd,Nl,No,Lm,Lo]
isHexDigit = isDigit c || c >= 'A' && c <= 'F' || c >= 'a' && c <= 'f'
isDigit = c >= '0' && c <= '9'
isOctDigit = c >= '0' && c <= '7'
isAlpha = category is one of [Lu,Ll,Lt,Lm,Lo]
isUpper = category is one of [Lu,Lt]
isLower = category is Ll
isLatin1 = c <= '\xFF'
isAscii = c < '\x80'

isDigit intentionally recognizes ASCII digits only. IMHO it's more
often needed and this is what the Haskell 98 Report says. (But I
don't follow the report in some other cases.)

Titlecase could be handled too. Even then I think that isUpper should
be True for titlecase letters (so it's usable for testing if the first
letter of a word is uppercase), and there should be a separate function
for category Lu only (for testing if all characters are uppercase).

-- 
 __("<  Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTĘPCZA
QRCZAK

Next message: Otto Stolz: "Re: Java, unicode and fonts."
Previous message: Marco.Cimarosti@icl.com: "RE: [very OT] Welsch (was: [very OT] "Slavic")"
Next in thread: Roozbeh Pournader: "Re: Character properties"
Maybe reply: Roozbeh Pournader: "Re: Character properties"
Maybe reply: Jonathan Rosenne: "RE: Character properties"
Maybe reply: Roozbeh Pournader: "Re: Character properties"
Maybe reply: Marcin 'Qrczak' Kowalczyk: "Re: Character properties"
Maybe reply: Marco.Cimarosti@icl.com: "RE: Character properties"
Maybe reply: Roozbeh Pournader: "RE: Character properties"
Maybe reply: Kenneth Whistler: "Re: Character properties"
Maybe reply: Mark Davis: "Re: Character properties"
Maybe reply: Marcin 'Qrczak' Kowalczyk: "Re: Character properties"
Maybe reply: Marco.Cimarosti@icl.com: "RE: Character properties"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT