Re: Public Review Issue Update: #100, "Giving U+00B7 MIDDLE DOT the ID_Continue Property"

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Jan 11 2007 - 21:07:54 CST


From: "Kenneth Whistler" <kenw@sybase.com>
> Philippe said:
>
>> So I've seen quite frequent occurences of such use of middle dots in
>> Latin transcriptions of Minnan, notably after a base letter "O" or "o"
>> (which may also have other diacritics on the letter: a
>
> I claimed that for a long time (and historically, I think I
> am correct), but a separate *combining* character has
> been encoded for this entity in Latin transcriptions of
> Minnan, and is claimed by the stakeholders in those
> transcriptions to be correct (current) usage:
>
> U+0358 COMBINING DOT ABOVE RIGHT

Isn't it notable that *most* Minnan documents we find are encoded using U+00B7 MIDDLE DOT and not this combining character? This is probably a legacy inherited from the frequent use of ISO 8859-* charsets where MIDDLE DOT is present, not the combining dot above right.

Note that Minnan Wikipedia uses U+00B7 extensively, including in its online editor (look at the character palette below the edit window). Do you mean that U+0358 should have been used? If so, UCA must be tailored for Minnan so that both characters match in searches, at least at the primary level (distinct letters), secondary level (distinct diacritics), and ternary level (distinct letter case).

But when I look at the rare fonts that define glyphs for both of them, these two characters generally don't have the same glyphs: the combining character is most often rendered higher (near the top of Latin letter ascents or inthe middle between the top of capital letters and the top of small letters) than the middle dot (in the middle of the x-height, i.e. at mid-way beween the baseline and the top of small letters). This is a very visible difference, and it is especially critical in Minnan where middle dot follows a small letter o (possibly with a combining diacritic above it for marking the tone).

>> Are there other languages?
>
> There are many, many orthographies of languages, particularly
> in the Americas, which make used of a raised dot (= U+00B7 MIDDLE DOT)
> to indicate distinctive length (usually for vowels, but
> occasionally for consonants as well). So MIDDLE DOT is
> an issue for many orthographies besides the one for Catalan.

This makes sense in fact each time the generally admitted diacritic for marking long vowels (generally the macron above) is used for something else (like tone);

But isn't there a colon-like symbol (in fact two small triangles stacked and pointing to each other) in IPA for long vowels? It's a non-combining modifier letter (U+02D0), and it may then be part of words (it is, in some languages), and identifiers... and it's more common within fonts than the combining U+0358 (for example, even Arial Unicode MS does not have it!).

> However, the particular concern for Catalan arises because of
> the *canonical* equivalence involving MIDDLE DOT for the
> characters U+013F LATIN CAPITAL LETTER L WITH MIDDLE DOT
> and U+0140 LATIN SMALL LETTER L WITH MIDDLE DOT.

I admit this is a good specific reason for special treatment; It should be noted in the complete text of the proposed update (which is still missing... for the existing document that describes the ID_Continue and XID_Continue properties)



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:55:40 CST