Re: Some thoughts on character decomposition

From: John Cowan (cowan@locke.ccil.org)
Date: Mon Jun 07 1999 - 09:56:13 EDT


Peter_Constable@sil.org wrote:

> If, as John suggested, a ZWSP is inserted into a URL,

I don't know if this is me or someone else, but if it's me, then
it's a misunderstanding.

> and someone then copies
> and pastes that URL into the address window of their browser, what will be the
> result?

The characters in URLs are only US-ASCII, and any other character
wanted in an URL must be encoded according to the standard rules:
map each non-ASCII character to its UTF representation as 2, 3,
or 4 bytes, and then encode those bytes as %xx sequences, where
xx is 2 hex digits. So a \u200B should appear as "%E2%80%8B".
 

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT