On 03/25/2002 04:38:08 PM Kenneth Whistler wrote:
>Peter Constable asked:
>
>> U+0027 APOSTROPHE has a general category of Po; U+02BC MODIFIER LETTER
>> APOSTROPHE has a general category of Lm. I haven't checked how they
>> compare with regard to any other properties. I'm wondering what kinds
of
>> text processes might be expected to distinguish between these (i.e.
give
>> different results / behaviours for the two characters).
>
>Well, for starters: isLetter() and isIdentifier() should give different
>results. U+02BC should be part of identifiers by default -- it is part
>of the alphabet of some languages. On the other hand, U+0027 is very
>often a syntax character, used as a 'quote' mark to indicate delimitation
>of an identifier or other symbol.
OK, both you and John mentioned identifiers. Let me ask a slightly
different question: I'm thinking about all of our linquists who have
existing data containing 0x27 to represent a glottal stop (some possibly
also using it as a quotation mark / apostrophe), and I'm thinking about
getting them migrating to using Unicode. I know that it would be good for
them to encode this orthographic representation of glottal stop as U+02BC,
but if they also use 0x27 for a quotation mark, it may be not so trivial
to get their data converted correctly, and many might be inclined to just
map 0x27 > U+0027. I'm trying to think of reasons to give them as to why
they might not want to do this, and usability for identifiers isn't going
to particularly grab the attention of many of them.
So, why might a linguist want to go through the extra effort to map 0x27 >
U+02BC in exactly those contexts when it should map to this and not U+2019
or something else?
- Peter
---------------------------------------------------------------------------
Peter Constable
Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>
This archive was generated by hypermail 2.1.2 : Mon Mar 25 2002 - 19:14:26 EST