RE: Hebrew script in IDN (was Exemplar Characters)

From: Jony Rosenne (
Date: Sat Nov 19 2005 - 08:08:03 CST

  • Next message: Neil Harris: "Re: Hebrew script in IDN (was Exemplar Characters)"

    Geresh is the one that looks similar to apostrophe, Gershayim looks like double quote.

    My proposal is to prohibit both Geresh and Gershayim in IDN, and to have them in the auxiliary exemplar characters.


    > -----Original Message-----
    > From:
    > [] On Behalf Of Philippe Verdy
    > Sent: Saturday, November 19, 2005 2:40 PM
    > To: Mark Davis; Neil Harris
    > Cc: Michael Everson; Unicode Discussion
    > Subject: Re: Hebrew script in IDN (was Exemplar Characters)
    > From: "Mark Davis" <>
    > > 2. [\- ‐ \: . ' ’ ‧] and [\u200C \u200D] are ineligible
    > for inclusion in
    > > the default identifiers, since they are in pattern-syntax
    > or are normally
    > > invisible, resp.
    > I fully disagree with you about the curly right apostrophe.
    > It is not part
    > of the pattern-syntax, and not invisible. And it is used for normal
    > orthographies of words. I can only agree with you about the
    > ASCII quote
    > which is definitely ambiguous.
    > Those languages that define a syntaxic role for the
    > apostrophe are bogous if
    > they exist and if they do not at least contain an escaping
    > mechanism (which
    > is not supposed to be used within identifiers). The only
    > languages I know
    > that need two different characters for left and right in a
    > quote pair are
    > using the ASCII quote and the ASCII backquote, not the apostrophe.
    > For IDN, the apostrophe is definitely not syntaxic and does
    > create confusion
    > with the ASCII quote which is forbidden anyway. So you don't
    > need to exclude
    > it from identifiers.
    > For IDN you could get possible confusion between curly
    > apostrophes and
    > gershaim, but it can be avoided very simply because the use
    > of apostrophe as
    > a letter and of gershaim is orthogonal in the same language,
    > so it cannot be
    > part of the same word token in the IDN label (by token, I
    > mean one of the
    > words in a hyphen-separated list of word tokens that make a
    > single domain
    > name label). This means that a registry would allow only one
    > or the other in
    > the same token.
    > The apostrophe cannot be used alone in a word token, the same
    > is true for
    > gershaim, so the accompanying letters still mandates which
    > one is correct
    > and allowed. These accompanying letters also fix the
    > directionality of the
    > token (RTL or LTR), so the gerchaim and apostrophe can be immediately
    > correctly interpreted.
    > This also means that gershaim and apostrophe could eventually
    > be unified in
    > the IDN registry that would want to support all languages,
    > provided that the
    > IDN client disambiguates the case after the surrounding
    > letter that reveals
    > the directionality. Under this consideration, given that the
    > single quote is
    > not used in plain ASCII, and it has a weak (contextual)
    > directionality it
    > could become the candidate to represent both the apostrophe
    > and Gershaim,
    > even if it's excluded from identifiers (this exclusion means
    > that another
    > character in the same IDN equivalence class must be used).

    This archive was generated by hypermail 2.1.5 : Sat Nov 19 2005 - 08:09:20 CST