From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Apr 15 2004 - 14:47:56 EDT
> Did you get an answer on this ? Why is there no decomposition associated
> to this character ?
Thanks to Eric and Patrick for digging out my answer on this perennial
question from a couple years back, and saving me the trouble of
having to rummage around to find it. :-)
Also, it should be noted that there *is* a decomposition for
U+0140 in the Unicode Character Database, to wit:
0140;LATIN SMALL LETTER L WITH MIDDLE DOT;Ll;0;L;<compat> 006C 00B7;...
^^^^^^^^^^^^^^^^^^
It is a compatibility decomposition for two reasons: the decomposition
into the sequence <006C, 00B7> may result in rendering differences
(both because of potentially different decisions about where the
render the dot and because the introduction of the U+00B7 MIDDLE DOT
might impact line break decisions, depending on the implementation);
secondly, the properties of the characters in the sequence
<006C, 00B7> are distinct from those for <0140> by itself, and
may impact things such as identifier parsing, again, depending on
an implementation. And, as I indicated before, U+0140 is itself
basically a compatibility character, introduced for mapping to
ISO 6937, a preexisting standard that was among the list of
character encoding standards intended to be covered by the initial
Unicode repertoire.
The character *was* in ISO 6937 for Catalan. Noting the Catalan
association in the Unicode names list is different from any
recommendation that U+0140 is the preferred character for the
representation of l followed by a middle dot in Catalan text.
Most existing Catalan data (8859-1, Windows 1252, primarily)
would not use it, of course. Converted to Unicode, that data would
also not use it, but be represented as the sequence <006C, 00B7>.
And there is every expectation that new data created in Unicode
would continue to use such a sequence for Catalan.
--Ken
This archive was generated by hypermail 2.1.5 : Thu Apr 15 2004 - 15:31:24 EDT