From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Jul 25 2007 - 06:46:00 CDT
I have not been able to submit this using the Unicode bug report form (due
to a technical problem of this form), because it does not accept my valid
email address: it incorrectly rejects the underscore (_) present in the user
name part of my email address, despite it is perfectly valid here (only
forbidden in the Internet domain name part).
Can someone in the UTC check once again the code for the Bug report form on
the Unicode site so that it will correctly parse and validate email
addresses by not assuming that the characters allowed on each side of the
"@" are the same (they have never been the same subsets, per RFC
specifications)?
AND PLEASE, can someone at UTC (Rick McGowan?) copy-paste this message below
using his own email address for posting there, so that it is considered in
the current public review of UAX#14 update 20 (closing on July 30)?
------------------------------------------------------------------------
Some comments about apparently forgotten cases.
The line breaking opportunities does not seem to handle some special cases
related to undesirable line breaks that are currently allowed.
This comes for example with parentheses, that currently always allow line
breaks after or before them and text they surround.
I can cite an example, in the officially documented French toponyms:
"Château-Chinon(Ville)" and "Château-Chinon(Campagne)" which are designating
two distinct French communes, and form a single compound name. The INSEE
officially writes them WITHOUT a space separator (then the term within
parentheses is not a common word but part of the toponym, so it takes a
mandatory capital.
In this case, allowing a line break before the opening parenthese would
allow a rendering where the line break, if inserted would be interpreted as
if there was a space, and the required capital on the term "Ville" or
"Campagne" between parentheses would look like a typo.
Note the difference with the French names of a few cantons that are
*qualified* by adding " (ville)" or " (campagne)" with a space separator and
no capital for the specifier (this occurs for example in the canton and
arrondissement around the French city (toponym) of "Strasbourg". The
generated name is NOT creating a compound name.
Note the difference with toponyms (or other proper names) that would be
otherwise written as "...-Ville" or "...-Campagne": in this case the
linebreak is possible after the hyphen, which remains when a line break
occurs and still explicitly marks that this is a compound name.
For strange reasons, the INSEE reference for French administrative units
(and the IGN, for its official toponyms) have used parentheses instead of an
hyphen.
How to handle this case, in a way so that parentheses will not allow a
linebreak on BOTH sides of parentheses if they are surrounded by
parentheses?
I can give another more common example where such linebreaks are
undesirable:
"un (ou plusieurs) mot(s)"
Note how the "s" plural mark in "mots" is marked as an alternative; it is
not separable from the word it normally completes. inserting a linebreak
between "mot" and "(s)" would be wrong.
Another example when writing maths formulas "f(x) = x + 2". Here again, the
term "f(x)" should remain unbreakable. The same should occur as well with
the term "f[x]" in "f[x] = x + 2".
I propose disallowing line breaks around ***BOTH*** sides of:
* (parentheses), or parenthese-like characters like
* [square brackets],
* ‹angle brackets or quotation marks› (we can accept it for lower than and
higher than signs), or even
* “double 6/9 quotation marks”, or
* «double angle quotation marks», or
* ‘single 6/9 quotation marks’, or
if and only if, the characters that are on each side of the marks would be
unbreakable in absence of these marks.
Note that I include the quotation marks because they are quite often used to
emphasize some important parts within a word.
This will also cover the case where ‘single 6/9 quotation marks’ are also
used as apostrophes (common in French, English to mark elision of letters or
some abbreviated words) or reversed apostrophes (used in polynesian
languages as a glottal consontal mark).
Are there known cases where a line break would still remain desirable with
these conditions?
Philippe.
This archive was generated by hypermail 2.1.5 : Wed Jul 25 2007 - 06:48:09 CDT