From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Nov 19 2005 - 06:39:31 CST
From: "Mark Davis" <mark.davis@icu-project.org>
> 2. [\- ‐ \: . ' ’ ‧] and [\u200C \u200D] are ineligible for inclusion in
> the default identifiers, since they are in pattern-syntax or are normally
> invisible, resp.
I fully disagree with you about the curly right apostrophe. It is not part
of the pattern-syntax, and not invisible. And it is used for normal
orthographies of words. I can only agree with you about the ASCII quote
which is definitely ambiguous.
Those languages that define a syntaxic role for the apostrophe are bogous if
they exist and if they do not at least contain an escaping mechanism (which
is not supposed to be used within identifiers). The only languages I know
that need two different characters for left and right in a quote pair are
using the ASCII quote and the ASCII backquote, not the apostrophe.
For IDN, the apostrophe is definitely not syntaxic and does create confusion
with the ASCII quote which is forbidden anyway. So you don't need to exclude
it from identifiers.
For IDN you could get possible confusion between curly apostrophes and
gershaim, but it can be avoided very simply because the use of apostrophe as
a letter and of gershaim is orthogonal in the same language, so it cannot be
part of the same word token in the IDN label (by token, I mean one of the
words in a hyphen-separated list of word tokens that make a single domain
name label). This means that a registry would allow only one or the other in
the same token.
The apostrophe cannot be used alone in a word token, the same is true for
gershaim, so the accompanying letters still mandates which one is correct
and allowed. These accompanying letters also fix the directionality of the
token (RTL or LTR), so the gerchaim and apostrophe can be immediately
correctly interpreted.
This also means that gershaim and apostrophe could eventually be unified in
the IDN registry that would want to support all languages, provided that the
IDN client disambiguates the case after the surrounding letter that reveals
the directionality. Under this consideration, given that the single quote is
not used in plain ASCII, and it has a weak (contextual) directionality it
could become the candidate to represent both the apostrophe and Gershaim,
even if it's excluded from identifiers (this exclusion means that another
character in the same IDN equivalence class must be used).
This archive was generated by hypermail 2.1.5 : Sat Nov 19 2005 - 06:42:02 CST