From: Mark Davis (mark.davis@icu-project.org)
Date: Thu Nov 17 2005 - 17:51:04 CST
It is not that clear-cut. Identifiers by their nature cannot include all
words and phrases valid in all languages. For IDN, for example, one
can't express the perfectly reasonable English word "can't", or a word
like "I.B.M.".
I did introduce a proposal in March for considering the status of some
word characters, which turned into a discussion into the UTC of whether
to add certain items to the identifier definition.
http://www.unicode.org/L2/L2005/05083-wordprops.txt
(I'll copy that section here for those without access:
0027 ; # Po APOSTROPHE
002D ; # Pd HYPHEN-MINUS
002E ; # Po FULL STOP
003A ; # Po COLON
00B7 ; # Po MIDDLE DOT
058A ; # Pd ARMENIAN HYPHEN
05F3 ; # Po HEBREW PUNCTUATION GERESH
05F4 ; # Po HEBREW PUNCTUATION GERSHAYIM
200C ; # Cf ZERO WIDTH NON-JOINER // for Indic?
200D ; # Cf ZERO WIDTH JOINER // for Indic?
2010 ; # HYPHEN
2019 ; # Pf RIGHT SINGLE QUOTATION MARK
2027 ; # Po HYPHENATION POINT
30A0 ; # Pd KATAKANA-HIRAGANA DOUBLE HYPHEN
The UTC decided that against adding them to the identifier definition.
If we were to change that for the Hebrew punctuation, we would have to
see a documented case for it.
Mark
Michael Everson wrote:
> At 17:42 +0100 2005-11-17, Cary Karp wrote:
>
>>> "These punctuation marks may not be available in all fonts (and legacy
>>> encodings), so an implementation should be prepared to degrade
>>> gracefully.
>>> U0027 APOSTROPHE for GERESH and U0022 QUOTATION MARK for GERSHAYIM are
>>> acceptable fallbacks."
>>
>>
>> The problem is that these fallbacks are not available in IDN under
>> any circumstances.
>
>
> If that is the case then surely the real characters must be allowed.
This archive was generated by hypermail 2.1.5 : Thu Nov 17 2005 - 17:52:25 CST