RE: Unicode in a URL

From: Paul Deuter (Paul.Deuter@plumtree.com)
Date: Thu Apr 26 2001 - 18:02:24 EDT

Next message: Peter_Constable@sil.org: "Re: On the possibility of guidance code points for the Private Use Area"
Previous message: Michael \(michka\) Kaplan: "Re: Tags and the Private Use Area"
Maybe in reply to: Paul Deuter: "Unicode in a URL"
Next in thread: Carl W. Brown: "RE: Unicode in a URL"
Reply: Carl W. Brown: "RE: Unicode in a URL"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Based on the responses, I guess my original question/problem was not
very well written.

UTF-7 won't work because it cannot be distinguished from ASCII without
something that identifies it as UTF-7.

The %XX idea does not work because this it already in use by lots of
software
to encode many different character sets. So again we need something that
identifies
it as UTF-8.

What is needed is an escape code that implicitly indicates the Unicode
character set.

I see this as somewhat analogus to the invention of the U+XXXX notation
in Unicode consortium writings? They needed a completely unambiguous way
to tell their readers that the 16 bit value was not "any" 16 bit value
but rather a specific Unicode codepoint. They invented a new kind of escape
sequence that said two things: what follows is hex *and* Unicode.

I see the BOM as filling the same need for text files. It was not enough
to invent Unicode but also a way to identify the encoding.

Paul Deuter
Internationalization Manager
Plumtree Software
paul.deuter@plumtree.com <mailto:Paul.deuter@plumtree.com>

-----Original Message-----
From: Markus Scherer [mailto:markus.scherer@jtcsv.com]
Sent: Thursday, April 26, 2001 11:29 AM
To: unicode
Subject: Re: Unicode in a URL

Paul Deuter wrote:
> I am wondering if there isn't a need for the Unicode Spec to also
> dictate a way of encoding Unicode in an ASCII stream. Perhaps

How many more ways to we need?

To be 8-bit-friendly, we have UTF-8.
To get everything into ASCII characters, we have UTF-7.
W3C specifies to use %-encoded UTF-8 for URLs.

> -----Original Message-----
> From: addison@inter-locale.com [mailto:addison@inter-locale.com]

> itself. The best way to handle it (from a reliability point of view) is to
> use UTF-8 for everything and to reinterpret the URL using code. The idea

This sounds good, too. Have your pages in UTF-8 and all servers will
interpret URLs as UTF-8.
Especially if browsers encode URLs differently, this is your best choice.

Of course, if this all does not work, the obvious choice for Unicode-broken
systems is to use only ASCII characters to begin with...

markus

Next message: Peter_Constable@sil.org: "Re: On the possibility of guidance code points for the Private Use Area"
Previous message: Michael \(michka\) Kaplan: "Re: Tags and the Private Use Area"
Maybe in reply to: Paul Deuter: "Unicode in a URL"
Next in thread: Carl W. Brown: "RE: Unicode in a URL"
Reply: Carl W. Brown: "RE: Unicode in a URL"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT