Re: Code point vs. scalar value from Philippe Verdy on 2013-09-18 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Wed, 18 Sep 2013 11:42:50 +0200

Yes, because surrogate "code units" are those used by UTF-16 for which a
standard behavior is formally defined.

But there are still many other encodings than standard UTF-16, which uses
those code points (don't forget that not all abstract characters are
encoded in the UCS, and surrogates are considered abstract characters in
some non-standard encodings, which assign them their OWN scalar values).
Don't think that these code points are not used, they are just not
interoperable for working *only* with plain-text, but plain-text is not the
only kind of data used in software or even within interoperable protocols
and documents.

All those other uses exist and frequently have their own standards, but
they are simply out of scope of the UCS (and its ISO/IEC/Unicode
standards). I'de sayf that code points are the necessary extension that
allows the Unicode standard to be integrable within other standards or
applications.

Also please don't use the terms "scalar value" alone. We are really
speaking about "Unicode scalar values" or "scalar value character property"
(as defined in TUS for all the standard UTF's), or "UCS scalar values"
(thinking in terms of ISO 10646 and the equivalent RFCs published by ISO
and IETF to define the same UTF's). The terms "code point" has also been
historically used since long in ISO 10646 (even before aligning the ISO and
Unicode standards) and at that time there were many more standard "code
points" than today.

There are "scalar values" used in so many other unrelated domains (notably
in mathematics, where a scalar value is an identifiable object that remains
constant in relation with some operations and independant of its context,
unlike functions, differential or aggragating operators...scalar values may
be sacalr only in some bases of the numeric domain), that using the terms
"code point" is certainly more specific, less ambiguous, and will avoid
more confusions with these application domains (including for example with
algorithms used to format integer, real or complex numbers into encoded
text, as used in localisation librabries like ICU wirh additional CLDR
data...)

2013/9/18 Stephan Stiller <stephan.stiller_at_gmail.com>

> On 9/18/2013 12:02 AM, Stephan Stiller wrote:
>
> That still doesn't mean surrogates are "used by UTF-16"
>
> => 'That still doesn't mean surrogate* code point*s are "used by UTF-16"'
>
Received on Wed Sep 18 2013 - 04:44:51 CDT

This archive was generated by hypermail 2.2.0 : Wed Sep 18 2013 - 04:44:51 CDT