From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Thu May 10 2007 - 16:27:00 CDT
Chris Harvey wrote on Thursday, May 10, 2007 4:08 PM
> I’m concerned with the addition of characters which are visually
> identical, and only differ in that one is punctuation and the other is
> meant to be an orthographical letter. As in the case for U+02BC
> MODIFIER LETTER APOSTROPHE and U+2019 RIGHT SINGLE QUOTATION MARK.
> Ojibwa can be typed on a
> US-English keyboard as long as the apostrophe is understood to be
> U+0027 or U+2019 (for those programs using auto-quotes). To introduce
> U+02BC would be very confusing to Ojibwa speakers; why is ' one thing
> in English but another in Ojibwa? I have had no success in
> communicating the practical need for two apostrophes, one for English,
> one for the Native language with speakers and language educators.
I must confess I am puzzled as to why the 'punctuation apostrophe', as in
English "can't", should be U+2019 rather than U+02BC. There must be an
explanation somewhere. It may be simply that it is too much to expect
people to make the corrrect distinction between U+2019 and U+20BC in
English. There are a few examples, such as "must've" and non-standard
"wa'er", and alien names like "Vl'hurg", where it is clearly letter-like,
but they are probably not enough.
> We could go further, Squamish writes its glottal stop with a 7, Tlingit
> with a period . , Arapaho writes /θ/ with the number 3. These
> orthographies were developed so that as few exotic characters as
> possible would be required, and that these languages could be typed on
> an English keyboard. Should new MODIFIER NUMBER SEVEN, MODIFIER NUMBER
> THREE characters be introduced?
In theory, yes. A hypothetical Arapaho *3a3a would be title-cased, by
default, to "3A3a", and, if I have not misinterpreted rule LB24 in UAX #14
Line Breaking Properties Unicode 5.0.0, there would be no line break in
*3a-3a from standard line-breaking unless hyphenation rules cut in. (I may
have misunderstood them - I'm seeing automatic line-breaking break at the
ASCII hyphen without trouble, but also getting line breaking at the
hyphen-minus of '20.0e-3'.). However, the postulated Arapaho hyphen problem
should go away if you use U+2010 HYPHEN for the hyphen function, instead of
U+002D HYPHEN-MINUS.
However, it it were worth the trouble of implementing extra characters, I
feel it would probably be better to try to move to more conventional letter
shapes, rather than add characters that will probably cause endless trouble.
> Perhaps I’m alone in thinking this, but users cannot be expected to
> differentiate between two visually identical characters, one for one
> language, one for another.
That's probably easier than distinguishing two identical characters in the
same language, and we often do it, albeit unreliably. However, I may be
underestimating the importance of the unpredictable distinctions made by
fonts, e.g. '0' v.'O'.
Richard.
This archive was generated by hypermail 2.1.5 : Thu May 10 2007 - 16:28:43 CDT