From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jan 09 2006 - 18:25:01 CST
Jukka quoted the standard:
> >> Being a compatibility decomposable
> >> character, it is not recommended except in the representation
> >
> > No, it does not say that.
>
> "Compatibility decomposable characters are a subset of compatibility
> characters included in the Unicode Standard to represent distinctions in
> other base standards. They support transmission and processing of legacy
> data. Their use is discouraged other than for legacy data or other special
^^^^^^^^^^^^^^^^
> circumstances."
^^^^^^^^^^^^^
There's your escape clause.
>
> Definition D21 in section 3,
> http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G748
>
> > There are exceptions to that interpretation
> > of compatibility characters (and compatibility decomposable characters),
> > the IJ LIGATURE and the LONG S are among them. I think it is perfectly
> > fine to recommend their use in situations like this
>
> I think so too; we seem to agree on the practical point. But I discussed
> what the standard says (in a somewhat odd place, but the same general idea
> can be seen elsewhere in the standard, too).
...
> (Maybe some official statement,
> constituting an explicit exception to the principle of avoiding
> compatibility decomposable characters, would be in order.)
Actually, I don't think so. The bullet at the definition
of compatibility decomposable characters already provides
sufficient wiggle-room. They are there for:
1. use with legacy data (which includes ISO 8859-1, by the way)
2. when you need them (special circumstances)
You wouldn't get very far trying to pushing a claim that the
Unicode Standard has a principle of "avoiding compatibility
decomposable characters", given that technically, they include
even such characters as U+00A0 NO-BREAK SPACE -- which is elsewhere
explicitly recommended for use (in special circumstances) for
preventing line breaks and for display of nonspacing marks
in apparent isolation, and so on.
I'd say the IJ is a similar case. Ordinarily you don't need it
for representing Dutch data, but U+0132 is encoded for those
circumstances where you might need it: interoperating with legacy
data (particularly ISO 6937), and special circumstances where you
might have requirements for letter spacing that couldn't be
met simply in plain text otherwise.
--Ken
>
This archive was generated by hypermail 2.1.5 : Mon Jan 09 2006 - 18:26:12 CST