From: Dean Snyder (dean.snyder@jhu.edu)
Date: Thu May 19 2005 - 20:15:19 CDT
Ken provides a qualified dissension by stating:
>Surrogate pairs are *not* a stateful mechanism in the sense
>that that term is generally applied to character encodings.
then proceeds:
>In UTF-16, 0xD800 does not set a "state" which then alters the
>interpretation of a subsequent code unit. 0xDF02 has its own, unique
>status, regardless of what precedes or follows it.
Well that, of course, depends on how you define state, acknowledgment of
which, I presume, is related to both your qualified dissension and your
use of quotes around the word "state" here.
Let me make my case for the statefulness of surrogates more explicitly.
If <0xD800 0xDF02> is interpreted differently than <0xD801 0xDF02>, then
the high surrogate is altering the interpretation of 0xDF02, the low
surrogate. I assert that that is stateful in the context of discussing
fragment fragility. The issue is you have the surrogate state being
established, and that, by definition, requires twice the number of code
units to establish any given code point - if either code unit is missing
the remaining code unit is uninterpretable. This co-dependency spans the
code unit level which fact, from a fragment fragility perspective, makes
the whole surrogate mechanism stateful.
By the way, can you indeed tell us what the "unique status" of the code
unit 0xDF02 is? And if it has one, why it is not spelled out in the standard?
Dean A. Snyder
Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218
office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi/
http://users.adelphia.net/~deansnyder/
This archive was generated by hypermail 2.1.5 : Thu May 19 2005 - 23:03:10 CDT