Re: questions regarding Special Casing

From: mark.davis@us.ibm.com
Date: Tue Jan 11 2000 - 15:59:09 EST


I'm cc:ing the unicode and unicore list, since these questions may be of
general interest.

> To: Mark Davis/Cupertino/IBM@IBMUS
> cc: <john_thomson@sil.org>
> Subject: questions regarding Special Casing
> Hello,
> Thanks for your contributions to spelling out Unicode for us
> developers and users.
> I'm working with a group that's developing linguistic tools.
> One of our goals is to
> comply with the Unicode 3.0 standard, including its
> specifications for character
> properties and case mappings. In reading your UTR #21 (Case
> Mappings --
> revision 3.0 11/03/1999) there were a couple of points that
> were unclear to me.
>
> Under section 2, "Guidelines", the bullets say,
> In all of the guidelines given below ... Treat 0345
> "combining iota subscript" as a lowercase letter.
> Currently in the Unicode data file UnicodeData.txt (v 3.0),
> character 0345's general category is "Mn"
> (mark, non-spacing). Is your guideline here a correction, i.e.
> should 0345's general category be changed to
> "Ll"?

No, what that means is that while for general purposes 0345 is correctly
characterized as Mn, for the purposes of case mappings *in the following
discussion* it should be handled differently.

> Another bullet in that list says
> A character is _cased_ if it is marked as uppercase,
> lowercase, or titlecase (Lu, Ll, Lt).
> If this definition is complete, then is character 0345
> considered cased? In a similar vein, are
> characters that have explicit case mappings considered cased,
> even if they are not "letters"?
> E.g.
> 24B6;CIRCLED LATIN CAPITAL LETTER A;So;0;L;<circle>
> 0041;;;;N;;;;24D0;
> 2160;ROMAN NUMERAL ONE;Nl;0;L;<compat> 0049;;;1;N;;;;2170;

This is a good point. For non-letters, it is a matter of trying to match
user expectations. Suppose that a user selected a paragraph of text and
lowercased it using a menu command. Would s/he expect to see roman
numerials and circled letters lowercased? I suspect so.

>
> In practical terms, if a string contains U+24B6 and no
> lowercase characters,
> should it be considered an uppercase string? If this string is
> converted to
> lowercase, should the 24B6 be converted to 24D0?
>
> It would perhaps be helpful to mention in your document the
> existence of non-letters that have case mappings,
> and clarify what the correct treatment of them would be
> according to the standard.

Agreed. The document should probably specify _cased_ to include non-letters
that have case mappings. I will bring this up at the next Unicode Technical
Committee meeting.

>
> Thanks for your help,
>
> Lars Huttar
> lars_huttar@sil.org
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT