Re: 'code unit' and 'code point' meaning check

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed May 14 2003 - 17:58:29 EDT

Next message: Rick Cameron: "RE: 'code unit' and 'code point' meaning check"

Previous message: Deborah Goldsmith: "Decimal separator with more than one character?"
Maybe in reply to: Ben Dougall: "'code unit' and 'code point' meaning check"
Next in thread: Rick Cameron: "RE: 'code unit' and 'code point' meaning check"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Ben,

> could someone confirm if i've got this correct, or not please?:
>
> a 'code unit' could be the same as a 'code point', but there again it
> might not be. it's possible that several 'code units' are required to
> make up a 'code point'? (so code units can be the same size or smaller
> than a code point, but not the other way round)?

Think of it this way.

The code *point* is a number in the codespace, used to encode
an abstract character. For Unicode, it is a number in the
range 0x0000..0x10FFFF (or think of it as 0..1,114,111 expressed
in decimal). These get expressed with the U+ notation in Unicode.
Thus U+0041 is the code point for LATIN CAPITAL LETTER A.

The code *unit* is a fixed-width integral data type used in the
context of a particular encoding form. The encoded character is
represented in that encoding form by either a single code unit
or a sequence of code units.

In UTF-8, the code unit is always an 8-bit integer. (0x00..0xFF)
In UTF-16, the code unit is always a 16-bit integer. (0x0000..0xFFFF)
In UTF-32, the code unit is always a 32-bit integer.
(0x00000000..0x0010FFFF)

Code units don't "make up a code point".

Rather, a sequence of one or more code units is used to
represent a Unicode encoded character in a particular encoding form.

--Ken

Next message: Rick Cameron: "RE: 'code unit' and 'code point' meaning check"
Previous message: Deborah Goldsmith: "Decimal separator with more than one character?"
Maybe in reply to: Ben Dougall: "'code unit' and 'code point' meaning check"
Next in thread: Rick Cameron: "RE: 'code unit' and 'code point' meaning check"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed May 14 2003 - 18:33:26 EDT