Re[2]: Discrepancy in ch03.pdf?

From: Anton Tagunov (atagunov@online.ptt.ru)
Date: Thu Apr 11 2002 - 04:17:29 EDT


Hello, Doug!

I)

AT> http://www.unicode.org/unicode/uni2book/ch03.pdf
AT>
1.
AT> - A single abstract character may correspond to more then one code
AT> value -
      for example, U+00C5 ... LATIN CAPITAL LETTER A WITH RING and
      U+212B ... ANGSTROM SIGN
2.
AT> - Multiple code values may be required to represent a single abstract
AT> character.

DE> I don't see a discrepancy between these two statements, at least not one
DE> that the phrase "more than one code value sequence" would clarify.

Yes, _this_ is the fragement that looks confusing to me.

2. says that a single abstaract character may need more then one
   code value to be encoded.
   Okay, this is about surrogate pairs.

1. speaks about a single abstract character mapping to two
  _scalar values_

But then it should have said "A single abstract charcter may
correspond to more then one SEQUENCE of 1 to 2 code values!!

Imagine an abstract character corresponds to two scalar values
over 0xFFFF. Then it corresponds to two PAIRS OF CODE VALUES, not to
two CODE VALUES

Dough?

---

II)

AT> For example, a byte is the code unit in SJIS:... AT> ideographs require two code values

DE> I do think the text here is unclear about "code values" and "code DE> units."

Doug, I did not mean to go that far :-)

DE> <http://www.unicode.org/unicode/reports/tr17/> between "code point" and DE> "code unit."

Thanks for the link!

DE> A code point ... U+0410 DE> Code units are the two bytes 0xD0 0x90 required to express DE> that code point in UTF-8, or the single 32-bit word 0x00000410 required DE> to express it in UTF-32. DE> Incorporating the concepts from UTR #17 into the main text is one place DE> where the "language tightening" project for Unicode 4.0 should really DE> pay off.

It looks to me that both concepts are already in ch03.pdf A code value is also referred to as a code unit in the information industry

A Unicode scalar value is also referred to as a code position or a code point in the information industry

Sure "language tightening" will be good, but this was not the part of ch03.pdf that got me confused. I personally am quite content with the - code value, code unit - code point, scalar value, code position definitions :-)

- Anton



This archive was generated by hypermail 2.1.2 : Thu Apr 11 2002 - 03:00:56 EDT