Hello, Doug!
I)
AT> http://www.unicode.org/unicode/uni2book/ch03.pdf
AT>
1.
AT> - A single abstract character may correspond to more then one code
AT> value -
for example, U+00C5 ... LATIN CAPITAL LETTER A WITH RING and
U+212B ... ANGSTROM SIGN
2.
AT> - Multiple code values may be required to represent a single abstract
AT> character.
DE> I don't see a discrepancy between these two statements, at least not one
DE> that the phrase "more than one code value sequence" would clarify.
Yes, _this_ is the fragement that looks confusing to me.
2. says that a single abstaract character may need more then one
code value to be encoded.
Okay, this is about surrogate pairs.
1. speaks about a single abstract character mapping to two
_scalar values_
But then it should have said "A single abstract charcter may
correspond to more then one SEQUENCE of 1 to 2 code values!!
Imagine an abstract character corresponds to two scalar values
over 0xFFFF. Then it corresponds to two PAIRS OF CODE VALUES, not to
two CODE VALUES
Dough?
---II)
AT> For example, a byte is the code unit in SJIS:... AT> ideographs require two code values
DE> I do think the text here is unclear about "code values" and "code DE> units."
Doug, I did not mean to go that far :-)
DE> <http://www.unicode.org/unicode/reports/tr17/> between "code point" and DE> "code unit."
Thanks for the link!
DE> A code point ... U+0410 DE> Code units are the two bytes 0xD0 0x90 required to express DE> that code point in UTF-8, or the single 32-bit word 0x00000410 required DE> to express it in UTF-32. DE> Incorporating the concepts from UTR #17 into the main text is one place DE> where the "language tightening" project for Unicode 4.0 should really DE> pay off.
It looks to me that both concepts are already in ch03.pdf A code value is also referred to as a code unit in the information industry
A Unicode scalar value is also referred to as a code position or a code point in the information industry
Sure "language tightening" will be good, but this was not the part of ch03.pdf that got me confused. I personally am quite content with the - code value, code unit - code point, scalar value, code position definitions :-)
- Anton
This archive was generated by hypermail 2.1.2 : Thu Apr 11 2002 - 03:00:56 EDT