You haven't been following the thread, have you. When you "count code
points" you can: either count the original code "points", which is the
same as counting scalar values, /because that's what an encoding form
encodes/; or count code points corresponding to code units because,
well, you can match them up. The latter interpretation seemed to derive
from terminological imprecision at first, but my concern and suspicion
turned out to be spot-on with what Twitter did historically.
On 9/16/2013 7:19 AM, Philippe Verdy wrote:
> 2013/9/16 Stephan Stiller <stephan.stiller_at_gmail.com
> <mailto:stephan.stiller_at_gmail.com>>
> > That's exactly what happens when people confuse "code point" with
> "scalar value" ;-) Hmm, whom might we blame? :-)
>
> Actually you never count scalar values. You are confusing tham with
> code units. Twitter was orignally counting UTF-16 code units, but now
> counts code points.
>
> Scalar values are unrelated, they are properites assigned to code
> points so that all code points have a scalar value but the reverse is
> true only with the valid range 0 to 0x1FFFFF. Scalar values are only
> used if you need to perform arithmetic to compute code points from
> others. This genreally does not work well within the UCS except in a
> few very small ranges (like decimal digits). The scalar value is also
> needed to convert from one standard UTF to another.
Received on Mon Sep 16 2013 - 09:51:06 CDT
This archive was generated by hypermail 2.2.0 : Mon Sep 16 2013 - 09:51:07 CDT