Re: Surrogate pairs and UTF-8

From: Pavils Jurjans (passiday@gmail.com)
Date: Thu Jun 22 2006 - 16:20:40 CDT

Next message: Eric: "The best tabs f0r men health by l0wer prices!"

Previous message: Edward Trager: "Re: Surrogate pairs and UTF-8"
In reply to: Edward Trager: "Re: Surrogate pairs and UTF-8"
Next in thread: Otto Stolz: "Re: Surrogate pairs and UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Mike, I updated the code so that the Firefox displays the results in the
gray boxes. However, it behaves a bit oddly. While it shows only one
character in the textbox, if I position cursor at the end of the text, and
then click backspace, it removes not both units of surrogate pair, but only
the low surrogate.

On 6/22/06, Edward Trager <ehtrager@umich.edu> wrote:

> ... Correct me if I am missing something:
>
> AJAX frameworks presumably have no problem whatsover transferring data
> directly in UTF-8 format. UTF-8 is the default encoding for XML. So, once
> the data get to the client, all one has to do is parse the UTF-8 strings
> directly out of the XML (assuming AJAX based on XMLHttpRequest) and wrap
> them
> inside of some XHTML tags for display. Where is the need to escape
> strings
> in XML? UTF-8 can encode all Unicode points.

The problem lies in the fact that if you want to save string data in XML
format, you can't just do [textNode.value = stringData] and assume that all
the odd control characters will pass through, when the XML file is
transferred, using UTF-8 encoding. It's even worse with XML attribute
values. So, the string data needs escaping. At this point, one has to decide
what escaping to use - whatever escaping will do, because the server end can
just do the opposite. However, since we talk about client side JavaScript
here, it better be some built-in function, otherwise large strings will need
considerable time to be processed. Also, it's nice to stick to some
standards. There kicks in the wonderful function encodeURIcomponent.
However, there are older browsers that don't support that function,
therefore we need to simulate it. Hence the need to have JS-based UTF-8
encoding.

Pavils

Next message: Eric: "The best tabs f0r men health by l0wer prices!"
Previous message: Edward Trager: "Re: Surrogate pairs and UTF-8"
In reply to: Edward Trager: "Re: Surrogate pairs and UTF-8"
Next in thread: Otto Stolz: "Re: Surrogate pairs and UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jun 22 2006 - 17:12:28 CDT