RE: Hebrew script in IDN (was Exemplar Characters)

From: JR (jr@qsm.co.il)
Date: Fri Nov 18 2005 - 09:30:17 CST

  • Next message: Michael Everson: "Re: ISO 15924: Different Arabic scripts?"

    > -----Original Message-----
    > From: unicode-bounce@unicode.org
    > [mailto:unicode-bounce@unicode.org] On Behalf Of Neil Harris
    > Sent: Friday, November 18, 2005 4:14 PM
    > To: Mark Davis
    > Cc: Michael Everson; Unicode Discussion
    > Subject: Re: Hebrew script in IDN (was Exemplar Characters)
    >
    >
    > Mark Davis wrote:
    > > It is not that clear-cut. Identifiers by their nature
    > cannot include
    > > all words and phrases valid in all languages. For IDN, for example,
    > > one can't express the perfectly reasonable English word
    > "can't", or a
    > > word like "I.B.M.".
    > >
    > > I did introduce a proposal in March for considering the
    > status of some
    > > word characters, which turned into a discussion into the UTC of
    > > whether to add certain items to the identifier definition.
    > >
    > > http://www.unicode.org/L2/L2005/05083-wordprops.txt
    > >
    > > (I'll copy that section here for those without access:
    > >
    > > 0027 ; # Po APOSTROPHE
    > > 002D ; # Pd HYPHEN-MINUS
    > > 002E ; # Po FULL STOP
    > > 003A ; # Po COLON
    > > 00B7 ; # Po MIDDLE DOT
    > > 058A ; # Pd ARMENIAN HYPHEN
    > > 05F3 ; # Po HEBREW PUNCTUATION GERESH
    > > 05F4 ; # Po HEBREW PUNCTUATION GERSHAYIM
    > > 200C ; # Cf ZERO WIDTH NON-JOINER // for Indic?
    > > 200D ; # Cf ZERO WIDTH JOINER // for Indic?
    > > 2010 ; # HYPHEN
    > > 2019 ; # Pf RIGHT SINGLE QUOTATION MARK
    > > 2027 ; # Po HYPHENATION POINT
    > > 30A0 ; # Pd KATAKANA-HIRAGANA DOUBLE HYPHEN
    > >
    > >
    > > The UTC decided that against adding them to the identifier
    > definition.
    > > If we were to change that for the Hebrew punctuation, we
    > would have to
    > > see a documented case for it.
    > >
    > > Mark
    > >
    >
    > Mark,
    >
    > I think you might meet some opposition to including the
    > following in IDNs:
    >
    > APOSTROPHE (?protocol character)
    > FULL STOP (it's a label separator: so no chance for use in IDN labels)
    > COLON (a definite protocol character in URLs)
    > ZWNJ and ZWJ (unless Indic experts can make a _very_ good
    > case for these
    > being used only in contexts where they cause _visible_ and
    > _unambiguous_
    > rendering changes)
    > RIGHT SINGLE QUOTATION MARK (spoof of APOSTROPHE)
    > HYPHENATION POINT (spoof of MIDDLE DOT)
    > KATAKANA-HIRAGANA DOUBLE HYPHEN (spoof of EQUALS SIGN,
    > ?protocol character)
    >
    > which leaves only
    >
    > 00B7 ; # Po MIDDLE DOT
    > 058A ; # Pd ARMENIAN HYPHEN
    > 05F3 ; # Po HEBREW PUNCTUATION GERESH
    > 05F4 ; # Po HEBREW PUNCTUATION GERSHAYIM
    >
    > as characters which I would consider possible uncontroversial
    > candidates
    > for IDN.

    Certainly Geresh and Gershayim are not uncontroversial.

    Jony

    >
    > -- Neil
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Fri Nov 18 2005 - 09:37:00 CST