Re: An A is an A is an A

From: Keld J|rn Simonsen (keld@dkuug.dk)
Date: Wed Aug 28 1996 - 22:16:27 EDT


unicode@Unicode.ORG writes:

> This discussion may be a little misleading regarding what the characters "are"
> and what is required in processing.

Yes, but please be careful to distinguish betwen the ISO standard
ISO/IEC 10646 and Unicode.

> U+0041 is ALWAYS an A, forever and forever.
>
> The sequence U+0041 U+0301 is canonically equivalent to U+00C1 A (LATIN
> CAPITAL LETTER A WITH ACUTE, if your mailer trashes that). A conformant
> process "shall not assume that the interpretation of two canonical-
> equivalent sequences are distinct." This means that I cannot claim
> that I had U+0041 U+0301, but you interpreted it as U+00C1, and you're
> wrong. It DOES NOT mean that all processing is much more complex. It
> depends entirely on what processing is going on.

This is only true in Unicode, ISO 10646 does not have this equivalence.

> If I am doing string copies into buffers, there is no difference whatsoever.
>
> If I am doing text matches for other than exact binary matches, then some
> table lookup is involved, which may require lookahead even in Level 1
> implementations. Whether this table lookup is "much more complex" using
> combining characters depends on your implementation of the lookup.

Now you are seriously confusing things. In level 1 (of ISO/IEC 10646),
you cannot use combining characters and there is thus not
equivalence as you state.

There is no level 1 on Unicode (as far as I know), it is all level 3
in 10646 sense.

keld



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT