Re: Unicode on a website

From: David Starner (
Date: Sun Sep 24 2000 - 02:55:01 EDT

On Sat, Sep 23, 2000 at 09:28:49PM -0800, Carl W. Brown wrote:
> SCSU is a character stream compression technique not exactly a character
> encoding scheme.

It's a text encoding form that happens to be smaller than many
alternatives, and that's a little complex to decode. So?

> I don't think that it would do for get method forms to
> just add the scsu block to the end of the URL. "Yes but that is my
> charset". Certainly not when they are trying to standardize on UTF-8 at
> least the basic info.

I don't understand what you're trying to say here. I'm talking about
encoding an HTML document in SCSU. No one has any plans on standardizing
the character set of HTML. Although most anticipate a general trend towards
UTF-8, there will be exceptions - iso8859-[1 2 15 ?] will stick around for
a long time, and many Asians will probably use UTF-16 instead of UTF-8.
> scsu makes sense for large blocks of data. Send the frame work in utf-8 but
> use HTTP to request the bulk data in scsu. If it is a small amount of data
> you don't want to pay the overhead of the compression.

Have you read anything about SCSU? Part of the point is that it can store a 6
character Russian message in 7 bytes. SCSU has no compression overhead versus
any other Unicode text encodings in the average case, for any size of message.

(And what's this about framework and HTTP? We weren't discussing HTTP, we were
discussing HTML.)
> You don't need a BOM with UTF-8.

And, in theory, you shouldn't need a BOM using SCSU in HTML. A BOM is
non-ASCII characters coming before the Character-Encoding tag, which
is generally a bad idea. But it may be wise to use a BOM here, for
additional clarity.

David Starner -
And crawling, on the planet's face, some insects called the human race.
Lost in space, lost in time, and meaning.
	-- RHPS

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT