Re: U+0140

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Apr 15 2004 - 19:22:40 EDT

  • Next message: Mark E. Shoulson: "Re: U+0140"

    ----- Original Message -----
    From: "Peter Kirk" <peterkirk@qaya.org>
    To: "Kenneth Whistler" <kenw@sybase.com>
    Cc: <unicode@unicode.org>
    Sent: Friday, April 16, 2004 12:03 AM
    Subject: Re: U+0140

    > On 15/04/2004 12:32, Kenneth Whistler wrote:
    >
    > >Philippe opined:
    > >
    > >
    > >
    > >>If there's something really missing for Catalan, it's a middle-dot letter
    with
    > >>general category "Lo", and combining class 0 (i.e. NOT combining).
    > >>
    > >>
    > >
    > >The one thing for sure is that the Unicode Standard does not need
    > >to encode more middle dots:
    > >
    > >00B7;MIDDLE DOT;Po;0;ON;;;;;N;;;;;
    > >0701;SYRIAC SUPRALINEAR FULL STOP;Po;0;AL;;;;;N;;;;;
    > >1427;CANADIAN SYLLABICS FINAL MIDDLE DOT;Lo;0;L;;;;;N;;;;;
    > >22C5;DOT OPERATOR;Sm;0;ON;;;;;N;;;;;
    > >2F02;KANGXI RADICAL DOT;So;0;ON;<compat> 4E36;;;;N;;;;;
    > >302E;HANGUL SINGLE DOT TONE MARK;Mn;224;NSM;;;;;N;;;;;
    > >30FB;KATAKANA MIDDLE DOT;Pc;0;ON;;;;;N;;;;;
    > >FE45;SESAME DOT;Po;0;ON;;;;;N;;;;;
    > >FF65;HALFWIDTH KATAKANA MIDDLE DOT;Pc;0;ON;<narrow> 30FB;;;;N;;;;;
    > >10101;AEGEAN WORD SEPARATOR DOT;Po;0;ON;;;;;N;;;;;
    > >1D16D;MUSICAL SYMBOL COMBINING AUGMENTATION DOT;Mc;226;L;;;;;N;;;;;
    > >2027;HYPHENATION POINT;Po;0;ON;;;;;N;;;;;
    > >16EB;RUNIC SINGLE PUNCTUATION;Po;0;L;;;;;N;;;;;
    > >1802;MONGOLIAN COMMA;Po;0;ON;;;;;N;;;;;
    > >318D;HANGUL LETTER ARAEA;Lo;0;L;<compat> 119E;;;;N;HANGUL LETTER ALAE A;;;;
    > >1D01B;BYZANTINE MUSICAL SYMBOL KENTIMA ARCHAION;So;0;L;;;;;N;;;;;
    > >
    > >(and that's not considering the lowered dots "FULL STOP" and the raised
    > >dots)
    > >
    > >
    > >
    > There are also, including combining middle dots (most of these listed at
    > U+00B7):
    >
    > U+0387 GREEK ANO TELEIA
    wrong form? it's a small square, and is the greek semicolon, and is then
    separating words.

    > U+05BC HEBREW POINT DAGESH OR MAPIQ
    where would you position it according to the Catalan L letter which has a
    distinct directionality, and should not inherit of the complexity of the Hebrew
    script?
    Why isn't there even U+0307 COMBINING DOT BELOW or U+0323 COMBINING DOT ABOVE in
    your list?

    > U+2022 BULLET
    too thick, and it is a word-breaking symbol with a candidate line break on
    either sides. most often is a bullet at the beginning of a sub-paragraph, but
    can be used for example to separate multiple titles (think about titles on
    CD-Audio) or dictionaries and lots of publication where it is a symbol mark
    which is used as a source anchor for a note.

    > U+2024 ONE DOT LEADER
    this is a spacing character, mostly a punctuation, and clearly word-breaking...

    > U+2219 BULLET OPERATOR
    this is a symbol with a evident word break on either sides (think about
    mathematical formulas)

    > U+2027 HYPHENATION POINT
    a good suggestion if this was not a punctuation... What is the exact status of
    this character? When I look into the UCD properties I see that:
    French name: POINT DE COUPURE DE MOT
    GC=Po: punctuation, other [not even a "connecting" Pc like the ASCII
    underscore], so a separator of words
    CC=0: not combining [OK]
    BD=ON: order neutral [OK]

    > What is U+2027 intended for? The name suggests that it might be what is
    > needed for Catalan.
    I think that this is better seen as an annotation used in dictionaries to note
    visually the position of candidate syllable breaks, (unlike the soft-hyphen
    which is normally not rendered except where the candidate line-break is
    realized).

    Many dictionnaries prefer a thin vertical line which extends from the descender
    to the ascender, and in fact there are fonts where this character is drawn like
    this, and which is not the same as the ASCII vertical line which is smaller and
    often thicker.) This notation symbol could be used in addition to and
    immediately after the Catalan middle-dot...
    My Larousse Catalan-French pocket dictionnary uses a very thin vertical line to
    mark word terminations and prefix/suffixes, in combination with a orthographic
    middle-dot in the Catalan word which is always noted.

    Question here: is that vertical line used in Larousse really the same as U+007C?
    In the same context I note that the ASCII TILDE (a large version aligned on the
    baseline) is used to note the common radical indicated by the vertical line
    symbol that separate prefixes and suffixes from the radical of the entry word...

    In the same dictionnary, the vertical line is also used, isolately or in a pair,
    and surrounded by a cadratin space, as a separator between definition items, to
    group them by semantic proximity; but in that case the vertical line is thicker
    and does not extend below the baseline, so this separator looks more like a true
    U+007C, i.e. a regular punctuation, with candidate line breaks occuring both
    before and after it (in fact at the position of the surrounding cadratin
    spaces)...

    In a Larousse French-German dictionnary, I can see the hypenation point used
    between a determining prefixes and the radical (for example: "ein*reisen" or
    "In*angriff*nahme"): this hyphenation point (noted here with a '*') is a
    notation symbol and is thicker. It's not even a middle-dot because it is drawn
    at the x-height ascent. It's not a bullet which is also used in the same
    Larousse dictionnaries where the bullet introduces a new grammatical semantic
    for the homonymic word.



    This archive was generated by hypermail 2.1.5 : Thu Apr 15 2004 - 20:05:10 EDT