Re: HTML - i18n / NCR & charsets

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Nov 27 1996 - 15:46:57 EST


>
> >A sublety: the i18n spec refers to UCS, which has a consquence
> >when going beyond BMP. There UCS has well defined numbers, while I
> >do not know whether Unicode has this.
>
> True. The numbers would correspond to UCS-4 past the BMP, as the i18n
> draft says. Unicode would represent these codes as UTF-16.
>
> This raised a question in my mind about the i18n draft and surrogates,
> and I discovered that it says nothing. Since numeric character references
> are to the UCS-4 form, it probably would have been better if the
> surrogate range had been excluded.
>
> David
>

I'm not sure I see the problem. The numeric character references
would be in UCS-4 form, but as long as they are within the intended
range for ISO 10646 extensions (0x00010000..0x0010FFFF) addressable
by UTF-16, a Unicode 2.0 compliant interpreter can pick them up
and convert to surrogate pairs for processing by simple algorithm.
See the Unicode Standard, Version 2.0, page C-3.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT