Re: Hebrew script in IDN (was Exemplar Characters)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Nov 19 2005 - 06:39:31 CST

  • Next message: Philippe Verdy: "Re: Hebrew script in IDN (was Exemplar Characters)"

    From: "Mark Davis" <mark.davis@icu-project.org>
    > 2. [\- ‐ \: . ' ’ ‧] and [\u200C \u200D] are ineligible for inclusion in
    > the default identifiers, since they are in pattern-syntax or are normally
    > invisible, resp.

    I fully disagree with you about the curly right apostrophe. It is not part
    of the pattern-syntax, and not invisible. And it is used for normal
    orthographies of words. I can only agree with you about the ASCII quote
    which is definitely ambiguous.

    Those languages that define a syntaxic role for the apostrophe are bogous if
    they exist and if they do not at least contain an escaping mechanism (which
    is not supposed to be used within identifiers). The only languages I know
    that need two different characters for left and right in a quote pair are
    using the ASCII quote and the ASCII backquote, not the apostrophe.

    For IDN, the apostrophe is definitely not syntaxic and does create confusion
    with the ASCII quote which is forbidden anyway. So you don't need to exclude
    it from identifiers.

    For IDN you could get possible confusion between curly apostrophes and
    gershaim, but it can be avoided very simply because the use of apostrophe as
    a letter and of gershaim is orthogonal in the same language, so it cannot be
    part of the same word token in the IDN label (by token, I mean one of the
    words in a hyphen-separated list of word tokens that make a single domain
    name label). This means that a registry would allow only one or the other in
    the same token.

    The apostrophe cannot be used alone in a word token, the same is true for
    gershaim, so the accompanying letters still mandates which one is correct
    and allowed. These accompanying letters also fix the directionality of the
    token (RTL or LTR), so the gerchaim and apostrophe can be immediately
    correctly interpreted.

    This also means that gershaim and apostrophe could eventually be unified in
    the IDN registry that would want to support all languages, provided that the
    IDN client disambiguates the case after the surrounding letter that reveals
    the directionality. Under this consideration, given that the single quote is
    not used in plain ASCII, and it has a weak (contextual) directionality it
    could become the candidate to represent both the apostrophe and Gershaim,
    even if it's excluded from identifiers (this exclusion means that another
    character in the same IDN equivalence class must be used).



    This archive was generated by hypermail 2.1.5 : Sat Nov 19 2005 - 06:42:02 CST