From: Mark Davis (mark.davis@jtcsv.com)
Date: Tue Apr 22 2003 - 10:30:26 EDT
That's an interesting approach, and legal according to posix (upper and
lower can overlap).
Any other comments on the other open issues (see chart for details)?
1. xdigit: there is a narrow interpretation (0..0,A..F,a-f), or a broad
interpretation (Nd + A..F, a..f,A..F, a..f) [normal & fullwidth]. We are
leaning towards the broad interpretation, since it appears more consistent.
2. cntl: add \p{gc=Zl} \p{gc=Zp}, the most control-like of the Cf? Add
other Cf's?
3. graph: exclude some/all Cfs?
Mark
(مرقص بن داود)
________
mark.davis@jtcsv.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799
----- Original Message -----
From: "Marco Cimarosti" <marco.cimarosti@essetre.it>
To: "'Mark Davis'" <mark.davis@jtcsv.com>; <unicore@unicode.org>;
<unicode@unicode.org>
Sent: Tuesday, April 22, 2003 04:33
Subject: RE: alpha, print, graph, blank, etc.
> Mark Davis wrote:
> > The POSIX/C-style property names (punct, alpha, lower, upper,
> > digit, xdigit, alnum, cntrl, graph, print, space, blank) are
> > not well specified, and don't really map well to the broader
> > types of characters available in Unicode/10646. For example,
> > there is no provision for titlecase, [...]
>
> My 0.2 euros: IMHO, title-case letters should be treated as *both*
> upper-case and lower-case. I.e., my suggestion is that:
>
> - is[w]lower() returns TRUE for both lower-case and title-case
> letters;
> - is[w]upper() returns TRUE for both upper-case and title-case
> letters;
> - is[w]alpha() returns TRUE for any Unicode letter (general category
> L*).
>
> For applications unaware of the existence if "title-case" letters, this
> saves the basic semantics of is[w]alpha() (namely, "Is it a letter?"), and
> one of the most basic semantics of is[w]lower() and is[w]upper() (namely,
> "Can this character be converted to lower/upper-case?").
>
> For applications aware of the existence if "title-case" letters, the
> is[w]upper(), is[w]lower(), and is[w]alpha() can be used in combination to
> determine the exact "case type" of any letter:
>
> if (iswalpha(c))
> {
> if (iswupper(c) && iswlower(c))
> {
> printf("This is a title-case letter (Lt).\n", c);
> }
> else if (iswupper(c) && !iswlower(c))
> {
> printf("This is an upper-case letter (Lu).\n", c);
> }
> else if (!iswupper(c) && iswlower(c))
> {
> printf("This is a lower-case letter (Ll).\n", c);
> }
> else /* if (!iswupper(c) && !iswlower(c)) */
> {
> printf("This is letter with no case distinctions (Lo
> or Lm).\n", c);
> }
> }
> else
> {
> printf("This is not a letter.\n", c);
> }
>
> Unfortunately, there is no corresponding trick to obtain a "to-title-case"
> functionality, apart a non portable construct such as:
>
> c1 = towctrans(c2, wctrans("Title-case"));
>
> Anyway, converting to title case is something less fundamental than
> upper/lower-casing, and it only makes sense at the string level.
>
> _ Marco
>
This archive was generated by hypermail 2.1.5 : Tue Apr 22 2003 - 11:18:34 EDT