Re: HTML - i18n / NCR & charsets

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Nov 27 1996 - 15:46:57 EST

Next message: David Goldsmith: "Draft does cover surrogates -- oops"
Previous message: David Goldsmith: "Re: HTML - i18n / NCR & charsets"
Maybe in reply to: Yung-Fong Tang: "Re: HTML - i18n / NCR & charsets"
Next in thread: Misha Wolf: "Re: HTML - i18n / NCR & charsets"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

>
> >A sublety: the i18n spec refers to UCS, which has a consquence
> >when going beyond BMP. There UCS has well defined numbers, while I
> >do not know whether Unicode has this.
>
> True. The numbers would correspond to UCS-4 past the BMP, as the i18n
> draft says. Unicode would represent these codes as UTF-16.
>
> This raised a question in my mind about the i18n draft and surrogates,
> and I discovered that it says nothing. Since numeric character references
> are to the UCS-4 form, it probably would have been better if the
> surrogate range had been excluded.
>
> David
>

I'm not sure I see the problem. The numeric character references
would be in UCS-4 form, but as long as they are within the intended
range for ISO 10646 extensions (0x00010000..0x0010FFFF) addressable
by UTF-16, a Unicode 2.0 compliant interpreter can pick them up
and convert to surrogate pairs for processing by simple algorithm.
See the Unicode Standard, Version 2.0, page C-3.

--Ken

Next message: David Goldsmith: "Draft does cover surrogates -- oops"
Previous message: David Goldsmith: "Re: HTML - i18n / NCR & charsets"
Maybe in reply to: Yung-Fong Tang: "Re: HTML - i18n / NCR & charsets"
Next in thread: Misha Wolf: "Re: HTML - i18n / NCR & charsets"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT