Re: LC_CTYPE locale category and character sets.

From: Keld J|rn Simonsen (keld@dkuug.dk)
Date: Thu Jul 16 1998 - 14:37:01 EDT


Christophe PIERRET writes:

> Here are some questions regarding character properties and cultural
> preferences:
>
> * Does the character properties defined in a LC_CTYPE posix locale
> category
> depends only on the character set of the locale ?

In principle not, in practice possibly. It is advocated that
all character properties stay the same across character sets
and language/country/culture. But in a culture there may be
specific recommendations on what is considered eg. a letter, a digit,
or a punctuation mark. In some cultures eg devanagari digits
are recognised as digits, while in others these may just be
considered some kind of strange special character. Also for
punctuation marks, eg quotation marks vary widely from culture
to culture.

> * Is it meaningful to consider that a unicode (considered as a character
> set) LC_CTYPE
> locale category doesn't change with the cultural preferences ?

In a set of implementations, yes.

> I can't imagine that LATIN CAPITAL LETTER A is not uppercase anymore !
>
> But are there any known example of a LC_CTYPE character property
> (isalpha, isupper, tolower, isdigit, isxdigit ...)
> which changes or should change from one culture to another ?

isupper/islower for Turkish is a prime example.
Uppercase of initial "ij" in Dutch (becomes both uppercase)
is another.

Keld Simonsen



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT