Re: Why are precomposed characters required for "backward compatibility"?

From: Doug Ewell (dewell@adelphia.net)
Date: Thu Jul 11 2002 - 22:14:53 EDT

Previous message: Tex Texin: "Re: [Fwd: [ ghostscript-Bugs-576651 ] ghostscript fails with someunicode char]"
In reply to: David Hopwood: "Re: *Why* are precomposed characters required for "backward compatibility"?"
Next in thread: David Starner: "Re: *Why* are precomposed characters required for "backward compatibility"?"
Next in thread: Kenneth Whistler: "Re: *Why* are precomposed characters required for "backward compatibility"?"
Reply: David Starner: "Re: *Why* are precomposed characters required for "backward compatibility"?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

David Hopwood <david dot hopwood at zetnet dot co dot uk> wrote:

> OTOH, there can be more than one way to represent composites that
> include two or more diacritics in different combining classes (e.g.
> <e with circumflex and dot below>). Technically, that would mean that
> strict byte-for-byte round-tripping of X -> NFD -> X would not be
> guaranteed in every case (unless X also requires that all data is
> normalised). This doesn't apply to T.61, but it does apply to other
> standards such as TIS620 (ISO-Latin-11 / Thai), which have combining
> marks in more than one class.

As you mentioned, this does not apply to T.61 or ISO 6937, because they
do not permit multiple diacritics to be applied to a single base
character.

> Users have basically ignored (if they are even aware of) any
> admonitions from standards institutions to treat U+005E, U+0060 or
> U+007E as spacing accents, and continued to use them for the purposes
> listed below:

Programming languages, notably C and its offspring, have appropriated
these characters for their own purposes. You can't really blame "users"
for that.

> So, there would have been no practical problem with disunifying
> spacing circumflex, grave, and tilde from the above US-ASCII
> characters, so that the preferred representation of all spacing
> diacritics would have been the combining diacritic applied to U+0020.

Except, of course, for any additional user confusion that might have
arisen from encoding three more lookalike "spoof buddies." Unicode is
already taking a lot of heat on the IDN list for not unifying all
"lookalike" pairs.

-Doug Ewell
Fullerton, California

Previous message: Tex Texin: "Re: [Fwd: [ ghostscript-Bugs-576651 ] ghostscript fails with someunicode char]"
In reply to: David Hopwood: "Re: *Why* are precomposed characters required for "backward compatibility"?"
Next in thread: David Starner: "Re: *Why* are precomposed characters required for "backward compatibility"?"
Next in thread: Kenneth Whistler: "Re: *Why* are precomposed characters required for "backward compatibility"?"
Reply: David Starner: "Re: *Why* are precomposed characters required for "backward compatibility"?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Thu Jul 11 2002 - 20:32:44 EDT

Re: *Why* are precomposed characters required for "backward compatibility"?

Re: Why are precomposed characters required for "backward compatibility"?