From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Mon Sep 27 2010 - 00:37:13 CDT
abysta wrote:
> What is the difference between 00B4 and 02CA?
U+00B4 ACUTE ACCENT is a legacy character with ambiguous semantics.
U+02CA MODIFIER LETTER ACUTE ACCENT is a modifier letter.
The formal properties for these characters, as defined in the Unicode
standard, reflect this difference to some extent. For example, the Unicode
line breaking rules allow a break before U+00B4 but not before U+02CA.
Chapter 7 of the standard describes the Unicode view on modifier letters:
http://unicode.org/versions/Unicode5.0.0/ch07.pdf
(page 28 in the PDF, page 250 in the standard).
The general idea seems to be that legacy characters like U+00B4 were
duplicated as modifier letters because ISO 8859 is ambiguous about their
role as spacing vs. nonspacing. However, it seems to me that ISO 8859 says,
somewhat obscurely but clearly, that all characters in it are spacing. On
the other hand, in implementations, U+00B4 has often been used as a
nonspacing diacritic mark. Moreover, it has often been used as a poor man’s
right single quotation mark, e.g. as in `foobar´, meant to represent ‘foobar’.
U+02CA is meant to be unambiguous as regards to its general nature as a
letter (character used in words), though its specific meaning (e.g., as a
tone mark when writing a tone language in Latin letters) has not been fixed.
If you consider using U+02CA, note that not even fairly modern programs
should be expected to treat it as a letter (e.g., so that double-clicking on
a word containing it would select the entire word, instead of stopping at
U+02CA). Such treatment is suggested, but not required, by the standard.
Moreover, font support to U+02CA is rather limited; see
http://www.fileformat.info/info/unicode/char/2ca/fontsupport.htm
-- Yucca, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Mon Sep 27 2010 - 00:44:36 CDT