Re: Some thoughts on character decomposition

From: Peter_Constable@sil.org
Date: Mon Jun 07 1999 - 12:55:12 EDT


That's what *supposed* to be in a URL. What I'm saying is this: if someone types
a URL in a document, I may want to copy that URL and paste it into the address
window of my browser. But if the author added ZWSPs to the URL to deal with line
breaking, then I may encounter problems when I try to use the URL in my browser.

Peter.

From: cowan@locke.ccil.org AT internet on 06/07/99 11:10 AM

Received on: 06/07/99

To: Peter Constable/IntlAdmin/WCT, unicode@unicode.org AT internet@Ccmail
cc:
Subject: Re: Some thoughts on character decomposition

Peter_Constable@sil.org wrote:

> If, as John suggested, a ZWSP is inserted into a URL,

I don't know if this is me or someone else, but if it's me, then
it's a misunderstanding.

> and someone then copies
> and pastes that URL into the address window of their browser, what will be the
> result?

The characters in URLs are only US-ASCII, and any other character
wanted in an URL must be encoded according to the standard rules:
map each non-ASCII character to its UTF representation as 2, 3,
or 4 bytes, and then encode those bytes as %xx sequences, where
xx is 2 hex digits. So a \u200B should appear as "%E2%80%8B".

--
John Cowan      http://www.ccil.org/~cowan              cowan@ccil.org
        You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
        You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
                Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT