From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Tue Jan 10 2006 - 03:38:48 CST
On Mon, 9 Jan 2006, Kenneth Whistler wrote:
>> (Maybe some official statement,
>> constituting an explicit exception to the principle of avoiding
>> compatibility decomposable characters, would be in order.)
>
> Actually, I don't think so. The bullet at the definition
> of compatibility decomposable characters already provides
> sufficient wiggle-room. They are there for:
>
> 1. use with legacy data (which includes ISO 8859-1, by the way)
>
> 2. when you need them (special circumstances)
Hardly anyone would consider using characters that he does not need, so
interpreting "special circumstances" as "when you need them" nullifies the
entire statement about avoiding compatibility decomposable characters.
Maybe it _should_ be nullified, but I'd rather see that happen as an
explicit decision in a revision of the standard, and that would also
require some tuning in other parts of the standard. (For example, saying
that compatibility characters were included only for compatibility with
existing standards would then be too strong, or at least misleading.)
> You wouldn't get very far trying to pushing a claim that the
> Unicode Standard has a principle of "avoiding compatibility
> decomposable characters",
Well, it has such a principle, in the form of an explicit statement
that I quoted and "between the lines" in various wordings.
> given that technically, they include
> even such characters as U+00A0 NO-BREAK SPACE -- which is elsewhere
> explicitly recommended for use (in special circumstances) for
> preventing line breaks and for display of nonspacing marks
> in apparent isolation, and so on.
An explicit recommendation, or permission, in the standard surely
constitutes an exception to the general rule, but it does not mean
that the rule does not exist or could be taken lightly.
> I'd say the IJ is a similar case.
Perhaps. Wouldn't it then be suitable to make a note of this?
> U+0132 is encoded for those
> circumstances where you might need it: interoperating with legacy
> data (particularly ISO 6937), and special circumstances where you
> might have requirements for letter spacing that couldn't be
> met simply in plain text otherwise.
It is a sufficient reason for U+0132 that it existed in other standards.
Interoperating with legacy data generally constitutes a reason to use
compatibility characters. But beyond that? If the ij ligature is
considered as a letter that is part of Dutch orthography, then its use
would be recommendable, without any "special circumstances". (We would
then rather say that replacing it by "i" and "j" would be appropriate
in some circumstances, including situations where U+0132 cannot be
used safely.) This might be a highly debatable issue, but if there is a
considerable community that regards the ij ligature as a letter, this
alone would be a reason to add some comment to U+0132.
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Tue Jan 10 2006 - 03:40:15 CST