From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon May 22 2006 - 15:44:06 CDT
On 5/21/2006 9:47 PM, Jukka K. Korpela wrote:
> On Sun, 21 May 2006, Kent Karlsson wrote:
>
>> Because it is in ISO/IEC 8859. Hadn't ISO/IEC 8859-1 been so
>> commonly supported, MICRO SIGN would have been canonically
>> equivalent with GREEK SMALL LETTER MU.
>
> Why? How did the common support to ISO/IEC 8859-1 dictate the decision
> that was made?
Without such common support, there would not have been a need to exactly
clone the sequence of 256 code positions from 0000 to 00FF.
Without that need, there would have been no reason to not simply have a
single character for both mu and micro.
> It would have been possible to make these characters canonically
> equivalent or even the same, although it would have been somewhat odd
> to have a Greek letter in the Latin 1 Supplement block and a
> corresponding hole in the Greek block.
Precisely, and that's something that would have not been acceptable to
the Greek national member body of ISO/IEC JTC1/SC2. Had anyone attempted
that, the Greek vote would have been negative on ISO/IEC 10646's
alignment with Unicode. Incidentally, for those who forget history, the
standard needed every single vote at that crucial point. That's the
reason behind a number of design decisions that appear odd without
taking into account the political dimension of securing acceptance of an
un-proven standard.
> As far as I can see, it was a practical decision. It looks like a
> natural decision to me, since the glyphs for these characters may well
> be different, and the MICRO SIGN can be seen as a special symbol
> historically based on GREEK SMALL LETTER MU rather than just its
> specialized usage.
I think the evidence is still in favor of viewing it as specialized
usage, but the disunification does indeed allow relatively
straightforward support for divergence in form.
>
>>> The present justification is that U+00B5 does not belong to
>>> any script, whereas U+03BC is in the Greek script.
>>
>> That's a mistake. It should be in the Greek script, of course,
>
> U+00B5 has the Script value of Common, which might perhaps more
> appropriately be characterized as belonging to _any_ script rather
> than not belonging to any script. What its script _should_ be is less
> obvious, but since it is only compatibility equivalent to a Greek
> letter, the current situation looks natural. Similarly, for example,
> ALEF SYMBOL is in the Common script, not in the Hebrew script.
And very appropriately so.
>
>> just like the OHM SIGN (which is canonically equivalent with
>> GREEK CAPITAL OMEGA;
>
> The OHM sign _is_ in the Greek script, and this is apparently based on
> its being _canonically_ equivalent to a Greek letter (which was a
> somewhat odd decision, but let's not go into that now).
Coding the OHM sign separately was something that comes from East Asian
character sets, some of which support both a code point for a symbol and
a separate code point for the character of the Greek script. Carrying
over that usage (and the similar usage of A with ring in contrast to
Angstrom) was felt to be a mistake and asserting canonical equivalence
was seen as the best way to discourage any future differentiation
between the halves of each pair.
>
>> the latter of course the preferred
>> character to denote the ohm unit symbol,
>
> Yes, there is an explicit statement about that in the Unicode standard.
>
>> just like GREEK SMALL
>> LETTER MU is the preferred character for denoting the
>> micro unit prefix symbol).
>
> Have you found such a statement, or even an implicit preference, in
> the Unicode standard, or some other standard? (Unfortunately,
> standards related to the SI, as many other standards, define the use
> of characters without identifying them by Unicode numbers or names.
> Historically, this is understandable, but it creates considerable
> vagueness in some cases.)
>
If your Unicode to 8859-1 mapping table supports mapping Greek mu to
micro sign, as well as the reverse, it would probably be preferable to
use the mu consistently. However, that would break any software that was
migrated to Unicode by straight 'widening' of code points and which
might have a numerical constant identifying the micro sign.
The prevalence of such software may be diminishing now that Unicode has
been around for a while, in favor of software that is written with
Unicode in mind. If so, a gradual move to support the mu as the
preferred character would be possible (as long as a provision is made to
recognize the micro sign in existing data).
On the other hand, millions of existing Latin-1 keyboards will continue
to support the micro sign
while a smaller number of Greek keyboards will not.
In a perfect world that issue would not exist, but in the real world,
each transition has both benefits and costs. This small discrepancy is
part of the costs of adopting a universal character set that needs to
function compatibly vis-a-vis existing devices, data and software.
A./
This archive was generated by hypermail 2.1.5 : Mon May 22 2006 - 15:51:19 CDT