From: Jim Allan (jallan@smrtytrek.com)
Date: Mon Aug 18 2003 - 12:06:37 EDT
Jill Ramonsky posted:
> I would really like it if these, and
> every single other character which is "only there for reasons of round trip
> compatibility" with something else, were explicity marked in the
> machine-readable charts with something meaning "Don't introduce this
> character, at all, ever. Don't try to interpret it. Just preserve it, in
> case it ever gets turned back to its original character set".
That would probably be too strong.
If characters are available then some people will use them. :-(
See section 2.3 at http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf
Unicode 3.0 contained under section D21 on compatibility characters:
<< Their use is discouraged other than for legacy data. >>
I don't know whether this statement was intentionally removed was
accidently dropped in the changes in 4.0 which distinguish
"compatitiblity character" from "compatibility composite character".
In any case people can't be prevent from doing things that are
officially discouraged, especially as for some particular use it might
be wrong to discourage them. So if you are handling Roman numerals in an
application and wish your handling to be complete then unfortunately you
do have to take the compatibility Roman numerals into account.
> U+2212 (minus sign) - an obvious clone of U+002D (hyphen-minus). Who
> uses this?
People concerned with proper appearance of the symbol in proportional
fonts. Almost all proportional fonts use a narrow hyphen dash rather
than a minus-width dash for the hyphen-minus character. In some
older-style fonts it is even a slanting character.
See http://www.unicode.org/versions/Unicode4.0.0/ch06.pdf in 6.2 for a
detailed discussion of the various dash characters.
> U+2217 (asterisk operator) - an equally obvious clone of U+002A
> (asterisk)
They look much the same in a typewriter style font. They don't do so in
proportional fonts where the regular asterisk tends to appear somewhat
like a superscript.
Unicode provides support both for good typographical usage as well as
traditional data-processing typographical usage based based on
typewriter technology.
> U+223C (tilde operator) - a clone of U+007E (tilde)
See http://www.unicode.org/versions/Unicode4.0.0/ch07.pdf and look for
"Spacing Clones of Diacritics".
The ASCII tilde was originally intended to be a non-spacing diacritic
tilde to be applied to other characters by backspace. In part because of
the low resolution of many early data-processing printers it was often
realized in a tilde operator form. That has now become its most normal
form in fonts.
But for good typography you do want to distinguish them and the
overloading of tilde as ASCII 7E means that a font may render a
mathemtical full-character tilde when you want to show a diacritic or
render a spacing diacritic when you wanted a mathematical operator.
Unicode is intended for typesetting applications as well as entering
computer code in a traditional typewriter style character set with
typewriter limitations.
> and then there's
> U+2223 (divides) - hell, that looks to me remarkably like U+007C
> (vertical line)
The do look close. But U+007C usually extends below the base line and
and U+2223 usually doesn't.
> For example:
> U+2264 (less than or equal to) - compare with U+2A7D (less than or
> slanted equal to)
I have no idea. You will probably have to ask the MathML people about
that one. See http://www.w3.org/TR/2001/REC-MathML2-20010221.
Mathematicians seem to think they need to distinguish the two.
As a non-mathematician I find many of these distinctions bewildering and
seemingly only typographical. But if mathematicians in some field make
fine distinctions based on such differences then it is important that
Unicode allow such distinctions to be maintained in plain text.
> In defence of this argument, I point out that the
> complementary relation, NOT equal to, has codepoint U+2270, and this is
> represented in the code charts as having a slanted equal to, so it OUGHT to
> be the complement of U+2A7D. (Unless I've missed it, there appears to be no
> "not equal to with horizontal equals" character).
The chart at http://www.unicode.org/charts/PDF/U2200.pdf does not show a
slanted equals.
For some discussion of the math symbols see also
http://www.unicode.org/unicode/reports/tr25/tr25-5.html.
Part of the problem is that differences that are in most environments
only typographical style differences may indicate semantic differences
in particular disciplines. It is impossible to establish a firm line as
to how important or common would would normally be a stylistic variation
must be before it should be encoded in Unicode for plain text distinctions.
For example open-loop _g_ is distinguished from close-loop _g_ in the
International Phonetic Alphabet and so Unicode encodes it separately at
U+0261.
A normal Latin Letter font would probably not have U+0261 in it at all
and might display U+0067 with either closed or open loop. But a font for
phonetic use should always display U+0067 with a closed loop.
Fonts like Arial Unicode MS lose the distinction.
For non-technical use people need not and mostly quite rightly will not
use the more technical symbols to make fine distinctions that don't
apply in their particular usage.
Jim Allan
This archive was generated by hypermail 2.1.5 : Mon Aug 18 2003 - 12:36:47 EDT