From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Apr 15 2004 - 15:32:22 EDT
Philippe opined:
> If there's something really missing for Catalan, it's a middle-dot letter with
> general category "Lo", and combining class 0 (i.e. NOT combining).
The one thing for sure is that the Unicode Standard does not need
to encode more middle dots:
00B7;MIDDLE DOT;Po;0;ON;;;;;N;;;;;
0701;SYRIAC SUPRALINEAR FULL STOP;Po;0;AL;;;;;N;;;;;
1427;CANADIAN SYLLABICS FINAL MIDDLE DOT;Lo;0;L;;;;;N;;;;;
22C5;DOT OPERATOR;Sm;0;ON;;;;;N;;;;;
2F02;KANGXI RADICAL DOT;So;0;ON;<compat> 4E36;;;;N;;;;;
302E;HANGUL SINGLE DOT TONE MARK;Mn;224;NSM;;;;;N;;;;;
30FB;KATAKANA MIDDLE DOT;Pc;0;ON;;;;;N;;;;;
FE45;SESAME DOT;Po;0;ON;;;;;N;;;;;
FF65;HALFWIDTH KATAKANA MIDDLE DOT;Pc;0;ON;<narrow> 30FB;;;;N;;;;;
10101;AEGEAN WORD SEPARATOR DOT;Po;0;ON;;;;;N;;;;;
1D16D;MUSICAL SYMBOL COMBINING AUGMENTATION DOT;Mc;226;L;;;;;N;;;;;
2027;HYPHENATION POINT;Po;0;ON;;;;;N;;;;;
16EB;RUNIC SINGLE PUNCTUATION;Po;0;L;;;;;N;;;;;
1802;MONGOLIAN COMMA;Po;0;ON;;;;;N;;;;;
318D;HANGUL LETTER ARAEA;Lo;0;L;<compat> 119E;;;;N;HANGUL LETTER ALAE A;;;;
1D01B;BYZANTINE MUSICAL SYMBOL KENTIMA ARCHAION;So;0;L;;;;;N;;;;;
(and that's not considering the lowered dots "FULL STOP" and the raised
dots)
> It's
> unfortunate that almost all legacy Catalan text transcoded to
> Unicode are based
> on the middle-dot symbol (the one mapped in ISO-8859-1 and ISO-8859-15)
> which is
> not seen by Unicode as a letter (Lo) but as a symbol only.
Actually, that is *fortunate*, not unfortunate, since it is the
correct conversion from 8859-1 (and Windows 1252) data.
How U+00B7 behaves in Catalan data is then a matter of local
*adaptation* of software for the correct handling of the Catalan
language.
Note that while the particular combination <006C, 00B7, 006C> is
a peculiarity of Catalan orthography, U+00B7 MIDDLE DOT (often
called a 'raised period') is
very widely used, indeed, in technical orthographies for many
languages, particularly in the Americas, where it is used much
more commonly than the IPA characters U+02D0 MODIFIER LETTER
TRIANGULAR COLON or U+02D1 MODIFIER LETTER HALF TRIANGULAR COLON
to indicate vocalic (or less commonly, consonantal) length.
Obsessing about the behavior of U+00B7 in Catalan data while
ignoring its use as a vowel length indicator in many, many
other orthographies is rather pointless, it seems to me.
--Ken
This archive was generated by hypermail 2.1.5 : Thu Apr 15 2004 - 16:09:48 EDT