Case mapping

From: Patrick Andries (pandries@iti.qc.ca)
Date: Fri May 05 2000 - 19:34:59 EDT

Next message: mark.davis@us.ibm.com: "Normalization Charts"
Previous message: Yves Arrouye: "RE: Encoding Bengali Vowel forms (again)"
Next in thread: Mark Davis: "Re: Case mapping"
Maybe reply: Mark Davis: "Re: Case mapping"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I have a few questions regarding TR21 (just trying to grasp).

1) Why is the titlecase form for 0149 ('n) the decomposed 02BC 006E (' + n)
?
See <Unicode 3.0 CD>/Unidata/SpecialCasing-2.txt. Why could it not be 0149 ?

a) Because 0149 is not considered as a single letter ?

b) Could it be because if the lowercase letter were to equal the
titlecase letter, the string beginning by that character can no longer be
detected as lowercase (see 2.2) or even titlecase (any lowercase letters
must follow cased characters ) given the current definitions ?

2) Can the Afrikaans titlecase word Ť ' + n ť (indefinite article Ť a ť) be
detected as a titlecased ?
In other words (in pseudo-code), isTitleCase(toTitleCase("\u0149")) == true
?
I believe not. Is it important ?

3) I wonder if some subtlety has not escaped me in the following description
:
ŤDetecting Titlecase
A string is titlecase if all four of the following conditions are true:

  a.. there is at least one cased character in the string
  b.. there are no distinct-uppercase (Lud) characters
  c.. any lowercase letters must follow cased characters
  d.. there are no titlecase or uppercase letters, except following uncased
characters ť
Would it not be clearer if the last part had a Ť or at the beginning of the
string ť appended to it ? As far as I understand, the string may contain an
uppercase letter and no uncased characters and still be titlecase.

4) Though I do not believe there is any mention of Ť sentence casing ť in
TR21, curious readers may be interested in noticing that in Afrikaans, when
a sentence begins with an Ť'nť, the next word is Titlecased (see
http://hapax.iquebec.com). I do not know whether this merits mentioning
anywhere in the technical reports, but the naďve approach of casing
sentences (i.e applying toTitleCase() to the first word) will therefore not
work under some locales.

Patrick Andries
Dorval (Québec)

Next message: mark.davis@us.ibm.com: "Normalization Charts"
Previous message: Yves Arrouye: "RE: Encoding Bengali Vowel forms (again)"
Next in thread: Mark Davis: "Re: Case mapping"
Maybe reply: Mark Davis: "Re: Case mapping"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT