> - If the storage is UTF-16, then UTF-16 indices are direct.
> To compute UCS-4
> indices you parse from the start of the text.
> - If the storage is UCS-4, then UCS-4 indices are direct. To
> compute UTF-16
> indices you parse from the start of the text.
> - Supporting surrogate pairs does not require using UCS-4 indices.
>
> Here is a simple example of a routine that accesses surrogate
> pairs with UTF-16
> indices, and returns them as UCS-4 characters (here called UTF-32):
Ok. It works in a loop, but you can't provide random-access to the string,
right? Suppose I have, stored on 16 bits, accessible through an str
variable:
s o m e <s1> <s2> t e x t <s1> <s2>
(<s1> <s2> is a surrogate pair). I do have 12 words of useful information,
and only 10 characters. So when I say:
str.getAt(7)
and I mean the 8th character, not the 8th word of storage, I do need to walk
the string in order to get 'x' and not 'e'. The indices don't seem direct
then.
Yves.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT