"Back up" here refers to decrementing the pointer in the string.
If you have a string consisting of the following UTF-16 code units, for example:
00C0 0020 20AC D800 DC00 00C5
0 1 2 3 4 5
If you set the pointer to code unit number 4 (counting from 0), you'll be pointed at "DC00", which is a trailing ("low") surrogate. The pointer needs to "back up" (decrement) by one to position 3 (0xD800) to find the start of the character (each of the other code units refers to a single code point).
Addison Phillips
Globalization Architect (Amazon Lab126)
Chair (W3C I18N WG)
Internationalization is not a feature.
It is an architecture.
> -----Original Message-----
> From: unicode-bounce_at_unicode.org [mailto:unicode-bounce_at_unicode.org] On
> Behalf Of Xue Fuqiao
> Sent: Tuesday, August 27, 2013 6:37 PM
> To: unicode_at_unicode.org
> Subject: What to backup after corruption of code units?
>
> Hi list,
>
> I'm reading Unicode 6.2.0 and have a question. In Section 2.5, Encoding Forms:
>
> For example, when randomly accessing a string, a program can find the
> boundary of a character with limited backup. In UTF-16, if a pointer
> points to a leading surrogate, a single backup is required. In UTF-8,
> if a pointer points to a byte starting with 10xxxxxx (in binary), one
> to three backups are required to find the beginning of the character.
>
> What does the "backup" mean here? What does the program backup?
>
> I searched "backup" with unicode.org/search/ but didn't get anything that
> looked promising. Can anyone point me in the right direction?
>
> (English is not my native language; please excuse typing errors.)
>
> --
> Best regards, Xue Fuqiao.
Received on Tue Aug 27 2013 - 23:43:05 CDT
This archive was generated by hypermail 2.2.0 : Tue Aug 27 2013 - 23:43:07 CDT