From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Apr 16 2004 - 06:11:00 EDT
From: "Peter Kirk" <peterkirk@qaya.org>
> On 15/04/2004 18:16, Philippe Verdy wrote:
> >So U+2027 (as well as the U+013F middle-dot found in ISO-8859-1/15) is not
the
> >exact character to represent this middle dot in all usages, ...
>
> Philippe, before jumping to this conclusion, please can you describe to
> me EXACTLY how the shape and behaviour of the Catalan middle dot differs
> from the behaviour of U+2027 defined in Unicode Standard Annex #14,
> http://www.unicode.org/unicode/standard/reports/tr14/tr14-15.html:
>
> > 2027
> > HYPHENATION POINT
> > A hyphenation point is a raised dot, which is used primarily to
> > visibly indicate syllabification of words. Syllable breaks are
> > potential line break opportunities in the middle of words. It is
> > mainly used in dictionaries and similar works. When an actual line
> > break falls inside a word containing hyphenation point characters, the
> > hyphenation point is rendered as a regular hyphen at the end of the line.
> >
>
> From the descriptions which you and Anto'nio have provided and from
> http://www.tug.org/TUGboat/Articles/tb16-3/tb48vali.pdf, it seems to me
> that the Catalan behaviour is exactly as described for U+2027 in USA
> #14, perhaps because the Catalan usage has been borrowed from dictionary
> usage or vice versa. This strongly suggests that U+2027 is the
> appropriate character for Catalan.
Did you read this PDF seriously: it really discusses about a hack needed to
reposition the middle-dot correctly so that the Catalan dot will:
- not alter the interletter space
- will be drawn on a higher position (approximately at the x-height) than
middle-dot (in the middle of the x-height and baseline), with a horizontal
position that centers it between the vertical stems of the two surrounding l or
L (this makes a difference for the uppercase letter).
So the encoded l-with-middle-dot and L-with-middle-dot, if properly created for
Catalan using these guidelines, will render much better than 'L' or 'l' followed
by U+00B7 and even better than U+2027.
If rendering is not important for you (it matters when one wants to create a
renderer), consider the case of collation, and text analysis. My view about the
precombined ligatures L-with-middle-dot is that their "letter" general category
makes things easier for writers and readers, even if both agree that there's no
such dotted-L letter in Catalan, but clearly a single L with an additional but
separate phonetic mark.
Another point: the middle dot in Catalan seems to be used only between a pair of
L letters. Typographers consider the double L with a middle-dot as a ligature,
and Catalan phonetic uses a dotted pair to change the phonetic (and even the
meaning) of a double-L from the "L mouillé" (where it is pronounced like y
between vowels), to a consonantal palatal L.
Last note: Catalan words starting by a double-L exist, but they apparently never
take a middle dot (because such orthograph always designates a consonnantal
palatal L, sometimes pronounced with some stress or with a audible
palato-lingual click or some prenasalisation; this pronounciation depends on the
4 local dialects spoken)
The phonetic distinction of medial double-L did not exist in medieval Catalan
texts where this mark was not written (like in French). The Catalan middle-dot
was then introduced later with a clear intent to not alter the number of letters
and their relative positions in the typography. Most modern text renderers on
computers display the 00B7 incorrectly for Catalan (notably in user interfaces
and in web browsers).
So, for a typographic point of view, the U+013F and U+0140 ligatures are much
better than their compatibility decomposition. I don't think they can be
described as compatibility characters. So the ISO 6937 standard for Videotex was
right when it defined this ligature to respect the normal typography, but the
compatibility decompositions using U+00B7 in Unicode are certainly not the best
ones (they are widely used today simply because the ligatures were missing in
ISO-8859-1 and in Windows 1252, and there was no other alternative than using
U+00B7 for that function).
This archive was generated by hypermail 2.1.5 : Fri Apr 16 2004 - 06:52:56 EDT