Re: Thank you for all the good information, sUTF32ToUTF8 function

From: Peter_Constable@sil.org
Date: Fri Nov 09 2001 - 09:48:30 EST

Previous message: Tom Emerson: "Question on script-name assignment"
Maybe in reply to: Peter_Constable@sil.org: "Thank you for all the good information, sUTF32ToUTF8 function"
Next in thread: David Starner: "Re: Thank you for all the good information, sUTF32ToUTF8 function"
Reply: David Starner: "Re: Thank you for all the good information, sUTF32ToUTF8 function"
Reply: Markus Scherer: "Re: Thank you for all the good information, sUTF32ToUTF8 function"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Thanks, Doug, for the comments.

>And I don't think you're supposed to exclude the surrogate code space
(0xD800
>through 0xDFFF) from normal processing. (This is the "D29 conundrum" --
all
>UTFs must support encoding of non-characters, including unpaired
surrogates,
>even though UTF-16 cannot do this.) The code you provided encodes
unpaired
>surrogates in four bytes -- by pushing them down to the final "else" --
which
>is wrong in any event and almost certainly not what the programmer
intended.

Yes, this is a goof. (I wrote a pseudo-code algorithm for going from
Unicode scalar values to UTF-8 and assumed "surrogate" USVs are not valid.
I wasn't anticipating at the time what a programmer would do with it.)

Any suggestions on what the right way to deal with "surrogate" codepoints
in this algorithm? They should not occur in the data, but what if they do?

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>

Previous message: Tom Emerson: "Question on script-name assignment"
Maybe in reply to: Peter_Constable@sil.org: "Thank you for all the good information, sUTF32ToUTF8 function"
Next in thread: David Starner: "Re: Thank you for all the good information, sUTF32ToUTF8 function"
Reply: David Starner: "Re: Thank you for all the good information, sUTF32ToUTF8 function"
Reply: Markus Scherer: "Re: Thank you for all the good information, sUTF32ToUTF8 function"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Nov 09 2001 - 11:47:10 EST