Re: Case mapping

From: Mark Davis (markdavis@ispchannel.com)
Date: Sat May 06 2000 - 19:05:06 EDT


Those are some good comments pn ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt. You might also be interested in a file I put up to help visualize the relationships with case folding. It is in draft form now and not 'public', but comments are welcome. See http://www.unicode.org/unicode/reports/tr21/CaseFolding.html.

See below for some responses to your message.

Mark

Patrick Andries wrote:

> I have a few questions regarding TR21 (just trying to grasp).
>
> 1) Why is the titlecase form for 0149 ('n) the decomposed 02BC 006E (' + n)
> ?
> See <Unicode 3.0 CD>/Unidata/SpecialCasing-2.txt. Why could it not be 0149 ?

This is a bug: it should be:

0149; 0149; 02BC 004E; 02BC 004E; # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE

> a) Because 0149 is not considered as a single letter ?
>
> b) Could it be because if the lowercase letter were to equal the
> titlecase letter, the string beginning by that character can no longer be
> detected as lowercase (see 2.2) or even titlecase (any lowercase letters
> must follow cased characters ) given the current definitions ?
>
> 2) Can the Afrikaans titlecase word « ' + n » (indefinite article « a ») be
> detected as a titlecased ?
> In other words (in pseudo-code), isTitleCase(toTitleCase("\u0149")) == true
> ?
> I believe not. Is it important ?

No, it wouldn't be.

> 3) I wonder if some subtlety has not escaped me in the following description
> :
> «Detecting Titlecase
> A string is titlecase if all four of the following conditions are true:
>
> a.. there is at least one cased character in the string
> b.. there are no distinct-uppercase (Lud) characters
> c.. any lowercase letters must follow cased characters
> d.. there are no titlecase or uppercase letters, except following uncased
> characters »

You are right. Probably clearest would be:

d.. no titlecase or uppercase letters follow cased characters

>
> Would it not be clearer if the last part had a « or at the beginning of the
> string » appended to it ? As far as I understand, the string may contain an
> uppercase letter and no uncased characters and still be titlecase.
>
> 4) Though I do not believe there is any mention of « sentence casing » in
> TR21, curious readers may be interested in noticing that in Afrikaans, when
> a sentence begins with an «'n», the next word is Titlecased (see
> http://hapax.iquebec.com). I do not know whether this merits mentioning
> anywhere in the technical reports, but the naïve approach of casing
> sentences (i.e applying toTitleCase() to the first word) will therefore not
> work under some locales.

Yes, there should be a note to that effect. In general the casing of sentences and titles will be language dependent. In another example, "Taming of the Shrew" would be the appropriate capitalization for a title in English.

>
>
> Patrick Andries
> Dorval (Québec)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT