From: Ben Dougall (bend@freenet.co.uk)
Date: Wed May 14 2003 - 18:20:26 EDT
On Wednesday, May 14, 2003, at 10:48 pm, Rick Cameron wrote:
> You can find the new, improved definitions of code point and code unit
> in
> the online draft of Chapter 3 of TUS 4.0,
> http://www.unicode.org/book/preview/ch03.pdf
yeah, i'm really struggling with that at the moment. it just won't get
into my head. :/
> A code point is a number between 0 and 0x10ffff. It is independent of
> the
> encoding form.
>
> A code unit is the basic chunk of bits in one of the encoding forms of
> Unicode - an 8-bit chunk in UTF-8, a 16-bit chunk in UTF-16 and a
> 32-bit
> chunk in UTF-32.
right, so this..:
>> a 'code unit' could be the same as a 'code point', but there again it
>> might not be. it's possible that several 'code units' are required to
>> make up a 'code point'? (so code units can be the same size or smaller
>> than a code point, but not the other way round)?
..was a fair enough description by the looks of things. the right way
round at least. (as opposed to my doubting follow up mail)
ok, thanks.
> (I'm sure this is an FAQ - but why are the code points 0xd800-0xdfff
> not
> considered noncharacters? Obviously no abstract character can be
> associated
> with them! Is there a different term that describes code points like
> this?)
<guess> that area is full of surrogates. so they need another code
point to make up a single character. on their own 0xd800-0xdfff are 1/2
characters :) </guess>
This archive was generated by hypermail 2.1.5 : Wed May 14 2003 - 19:15:17 EDT