From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Apr 15 2004 - 19:22:40 EDT
----- Original Message -----
From: "Peter Kirk" <peterkirk@qaya.org>
To: "Kenneth Whistler" <kenw@sybase.com>
Cc: <unicode@unicode.org>
Sent: Friday, April 16, 2004 12:03 AM
Subject: Re: U+0140
> On 15/04/2004 12:32, Kenneth Whistler wrote:
>
> >Philippe opined:
> >
> >
> >
> >>If there's something really missing for Catalan, it's a middle-dot letter
with
> >>general category "Lo", and combining class 0 (i.e. NOT combining).
> >>
> >>
> >
> >The one thing for sure is that the Unicode Standard does not need
> >to encode more middle dots:
> >
> >00B7;MIDDLE DOT;Po;0;ON;;;;;N;;;;;
> >0701;SYRIAC SUPRALINEAR FULL STOP;Po;0;AL;;;;;N;;;;;
> >1427;CANADIAN SYLLABICS FINAL MIDDLE DOT;Lo;0;L;;;;;N;;;;;
> >22C5;DOT OPERATOR;Sm;0;ON;;;;;N;;;;;
> >2F02;KANGXI RADICAL DOT;So;0;ON;<compat> 4E36;;;;N;;;;;
> >302E;HANGUL SINGLE DOT TONE MARK;Mn;224;NSM;;;;;N;;;;;
> >30FB;KATAKANA MIDDLE DOT;Pc;0;ON;;;;;N;;;;;
> >FE45;SESAME DOT;Po;0;ON;;;;;N;;;;;
> >FF65;HALFWIDTH KATAKANA MIDDLE DOT;Pc;0;ON;<narrow> 30FB;;;;N;;;;;
> >10101;AEGEAN WORD SEPARATOR DOT;Po;0;ON;;;;;N;;;;;
> >1D16D;MUSICAL SYMBOL COMBINING AUGMENTATION DOT;Mc;226;L;;;;;N;;;;;
> >2027;HYPHENATION POINT;Po;0;ON;;;;;N;;;;;
> >16EB;RUNIC SINGLE PUNCTUATION;Po;0;L;;;;;N;;;;;
> >1802;MONGOLIAN COMMA;Po;0;ON;;;;;N;;;;;
> >318D;HANGUL LETTER ARAEA;Lo;0;L;<compat> 119E;;;;N;HANGUL LETTER ALAE A;;;;
> >1D01B;BYZANTINE MUSICAL SYMBOL KENTIMA ARCHAION;So;0;L;;;;;N;;;;;
> >
> >(and that's not considering the lowered dots "FULL STOP" and the raised
> >dots)
> >
> >
> >
> There are also, including combining middle dots (most of these listed at
> U+00B7):
>
> U+0387 GREEK ANO TELEIA
wrong form? it's a small square, and is the greek semicolon, and is then
separating words.
> U+05BC HEBREW POINT DAGESH OR MAPIQ
where would you position it according to the Catalan L letter which has a
distinct directionality, and should not inherit of the complexity of the Hebrew
script?
Why isn't there even U+0307 COMBINING DOT BELOW or U+0323 COMBINING DOT ABOVE in
your list?
> U+2022 BULLET
too thick, and it is a word-breaking symbol with a candidate line break on
either sides. most often is a bullet at the beginning of a sub-paragraph, but
can be used for example to separate multiple titles (think about titles on
CD-Audio) or dictionaries and lots of publication where it is a symbol mark
which is used as a source anchor for a note.
> U+2024 ONE DOT LEADER
this is a spacing character, mostly a punctuation, and clearly word-breaking...
> U+2219 BULLET OPERATOR
this is a symbol with a evident word break on either sides (think about
mathematical formulas)
> U+2027 HYPHENATION POINT
a good suggestion if this was not a punctuation... What is the exact status of
this character? When I look into the UCD properties I see that:
French name: POINT DE COUPURE DE MOT
GC=Po: punctuation, other [not even a "connecting" Pc like the ASCII
underscore], so a separator of words
CC=0: not combining [OK]
BD=ON: order neutral [OK]
> What is U+2027 intended for? The name suggests that it might be what is
> needed for Catalan.
I think that this is better seen as an annotation used in dictionaries to note
visually the position of candidate syllable breaks, (unlike the soft-hyphen
which is normally not rendered except where the candidate line-break is
realized).
Many dictionnaries prefer a thin vertical line which extends from the descender
to the ascender, and in fact there are fonts where this character is drawn like
this, and which is not the same as the ASCII vertical line which is smaller and
often thicker.) This notation symbol could be used in addition to and
immediately after the Catalan middle-dot...
My Larousse Catalan-French pocket dictionnary uses a very thin vertical line to
mark word terminations and prefix/suffixes, in combination with a orthographic
middle-dot in the Catalan word which is always noted.
Question here: is that vertical line used in Larousse really the same as U+007C?
In the same context I note that the ASCII TILDE (a large version aligned on the
baseline) is used to note the common radical indicated by the vertical line
symbol that separate prefixes and suffixes from the radical of the entry word...
In the same dictionnary, the vertical line is also used, isolately or in a pair,
and surrounded by a cadratin space, as a separator between definition items, to
group them by semantic proximity; but in that case the vertical line is thicker
and does not extend below the baseline, so this separator looks more like a true
U+007C, i.e. a regular punctuation, with candidate line breaks occuring both
before and after it (in fact at the position of the surrounding cadratin
spaces)...
In a Larousse French-German dictionnary, I can see the hypenation point used
between a determining prefixes and the radical (for example: "ein*reisen" or
"In*angriff*nahme"): this hyphenation point (noted here with a '*') is a
notation symbol and is thicker. It's not even a middle-dot because it is drawn
at the x-height ascent. It's not a bullet which is also used in the same
Larousse dictionnaries where the bullet introduces a new grammatical semantic
for the homonymic word.
This archive was generated by hypermail 2.1.5 : Thu Apr 15 2004 - 20:05:10 EDT