From: Addison Phillips (addison@yahoo-inc.com)
Date: Thu Jun 22 2006 - 09:46:27 CDT
> There are cases where such JavaScript conversion is needed 
> and it's perfectly possible (and in fact easy) to convert 
> between the natural Javscript encoding, as seen in 
> string.length(), string.codeCharAt(), or string.indexOf(...), 
> and UTF-8. 
Okay, I concede that one can convert a JavaScript String's code points in an
attempt to un-mojibake it... except please note what I said:
> There is usually something (else) wrong when a developer is 
> trying to do this in JavaScript.
That is, yes, one can attempt to fix one's data that way (it does rely on
the content being interpreted as 8859-1--and if there are bytes in the 0x80
through 0x9F range a lot of user-agents are going to interpret that as
windows-1252, even if it is labelled as 8859-1).
> Such conversion is useful when JavaScript will be 
> used to generate documents, or some responses to servers 
> handling only the UTF-8 encoding in some specific protocol 
> (for example when you need to compute binary signatures, or 
> the encoded length in some part of this protocol).
Uh... why not assemble a String containing the text and then set the
Content-Type of the document to the desired encoding (UTF-8 in this case)?
Assembling documents in UTF-8 via manual conversion is not necessary. And it
is prone to error.
> There's no guarantee that the Javascript string 
> will be preserved on output when it is sent to a stream using 
> a charset using a charset not completely covering the UCS. 
> (Normally such conversion from the Javascript internal 
> encoding of strings to another encoding is performed by the 
> stream object, according to its settings properties, for 
> example a HTTP or MIME message object where you can set the 
> charset used for encoding/decoding their stream).
Yes. That's exactly what I said. Hence: what are you writing an
encoder/decoder for?
Addison
Addison Phillips
Internationalization Architect - Yahoo! Inc.
Internationalization is an architecture.
It is not a feature.  
> -----Original Message-----
> From: Philippe Verdy [mailto:verdy_p@wanadoo.fr] 
> Sent: jeudi 22 juin 2006 06:16
> To: Addison Phillips; unicode@unicode.org
> Subject: Re: Surrogate pairs and UTF-8
This archive was generated by hypermail 2.1.5 : Thu Jun 22 2006 - 10:13:26 CDT