Re: validity of lone surrogates

From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Tue Jul 03 2001 - 18:00:02 EDT


Tue, 3 Jul 2001 11:19:05 +0100, Michael Everson <everson@indigo.ie> pisze:

>>I would be glad if the resolution allowed UTF-8 and UTF-32 encoders and
>>decoders to not worry about surrogates at all. Please leave surrogate
>>issues to UTF-16.
>
> But what if I want to put up a Web page in Etruscan?

UTF-8 and UTF-32 handle characters above U+FFFF with no problem.
I mean: forget about surrogates, i.e. about encoding those characters
as pairs of words in the range 0xD800..DFFF in encodings other than
UTF-16. For those encodings U+D800..DFFF are just code points like
others; they encode the whole contiguous range U+0000..10FFFF (maximum
would be U+7FFFFFFF if the idea of UTF-16 wasn't pushed so hard).

-- 
 __("<  Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTĘPCZA
QRCZAK



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 13:48:07 EDT