From: John Hudson (tiro@tiro.com)
Date: Wed Mar 05 2003 - 14:12:30 EST
At 08:35 AM 3/5/2003, John Cowan wrote:
> > Then why does UnicodeData break them down as (e.g.) 0064 030C rather than
> > 0064 0315?
>
>To keep the upper case and lower case characters in sync for decomposition,
>they always have the same combining characters.
Yes. There is nothing technically or grammatically incorrect about thinking
of d' l' and t' as letters with 'carons': it is only typographically
incorrect to represent them with the typical caron mark. The encoding of
characters and the visual representation of characters do not always
directly correspond.
>For another example, G with
>cedilla gets the cedilla on top when it's a capital, but it still decomposes
>to the ordinary combining cedilla. These are essentially font-ligaturing
>issues.
Not quite, in that the font does not necessarily require ligature
substitution data for characters that are encoded in Unicode in precomposed
forms. Systems and applications should take care of canonical composition,
not fonts.
By the way, although Unicode calls it a cedilla, the correct form to use
with G is the disconnected, 'under comma' form.
John Hudson
Tiro Typeworks www.tiro.com
Vancouver, BC tiro@tiro.com
It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force. - Michael Apostolis, 1467
This archive was generated by hypermail 2.1.5 : Wed Mar 05 2003 - 14:52:51 EST