From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Tue Oct 07 2003 - 10:15:23 CST
Elliotte Rusty Harold wrote:
> A W3C XML Schema Language validator needs a character based API to
> correctly implement the minLength and maxLength facets on xsd:string
As far as I understand, xsd:string is a list of "Character"-s, and a
"Character" is an integer which can hold any valid Unicode code point.
In other terms, xsd:string is necessarily in UTF-32 (or something close to
it): it cannot be in UTF-8 or UTF-16.
The numbers returned by length, minLength and maxLength are the actual,
minimum and maximum number of *list elements*, contained in the list. I.e.,
in the case of xsd:string, the *size* of the string in *encoding units*.
The fact that, in UTF-32, the *size* of the sting in encoding units
corresponds to the number of "characters" is coincidental.
In any case, the useful information is always the *size* of the string in
encoding units (octets for UTF-8, 16-bit units for UTF-16, etc.), not the
number of "characters" it contains.
_ Marco
This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST