From: Kenneth Whistler (kenw@sybase.com)
Date: Thu May 03 2007 - 18:21:39 CST
Mark said,
> In practice, I don't think this new character need cause any particular
> problems for searching. It can compatibly have the relation
>
> lowercase(capital-ß) = ß
>
> That means that we would make it a case-folding variant of ß,
I agree with Mark up to this point.
> and a
> collation variant of ß.
This does not follow as a consequence of that, however.
The UnicodeData.txt entry for U+00DF LATIN SMALL LETTER SMALL S is:
00DF;LATIN SMALL LETTER SHARP S;Ll;0;L;;;;;N;;German;;;
In other words, it has no simple case mapping, nor does it
have a compatibility decomposition to <s, s>. CaseFolding.txt
does provides a full case mapping for it.
For collation, a specific weighting is added to the DUCET,
*not* based on UnicodeData.txt, to result in:
00DF ; [.11AF.0020.0004.00DF][.0000.0199.0004.00DF][.11AF.0020.001F.00DF] #
LATIN SMALL LETTER SHARP S
The sequence of two <s> weights, plus the constructed secondary
weight for the first <s>, is completely the result of
deliberate introduction of this weight in the DUCET.
The same thing would have to be done, deliberately, to
get the UCA to weight the LATIN CAPITAL LETTER SHARP S as
equivalent to a secondary-weighted <S, S> sequence,
thus resulting in the expected behavior for sorting
and searching.
> We would still keep the uppercase mapping:
>
> uppercase(ß) = SS
I agree that that would be required for stability.
>
> Mark
>
> On 5/3/07, John Hudson <john@tiro.ca> wrote:
> > [The proposal recommends for discussion a possible compatibility
> > decomposition to 'U+0053
> > U+0053' to 'provide for the equivalence of the character sequences
> > "capital ß" and "SS" in
> > those applications that use the Normalization Form KD or KC for the
> > detection of sameness
> > of names etc.' How viable is this?]
In response to John on that point, I don't think it is viable
at all. Remember that U+00DF itself doesn't have a compatiblity
decomposition either. The equivalence in terms of searching
is handled, instead, by the special treatment in the DUCET
table for UCA (and equivalently in the CTT for ISO 14651, of
course).
--Ken
This archive was generated by hypermail 2.1.5 : Thu May 03 2007 - 18:24:23 CST