RE: UTF-8 in web pages

From: Yoshifumi Inoue (yinoue@microsoft.com)
Date: Fri Feb 05 1999 - 15:51:45 EST


In Netscape Navigator 4 Japanese version, the default font for UTF-8
encoding is "Times New Roman" in Windows platform.

If you don't specify the font for text, e.g. FONT tag or CSS, you can not
see UTF-8 encoded Japanese characters.

For Far East country, Japan, Korean, PRC, Taiwan, UTF-8 is 1.5 times bigger
than their local encoding.

Also, please remember Unicode unifies Hanji characters, if you see the page
contains Japanese Kanji, Chinese Hanji, the page will be curious. Since,
Japanese expect Japanese Kanji in the text, but sometimes it will be
displayed in Chinese font unless you explicitly specifies font for each
text.

UTF-8 is just exchange character information. It does not provide us script
information, e.g. rendering fonts.

- yosi

 -----Original Message-----
From: schererm@us.ibm.com [mailto:schererm@us.ibm.com]
Sent: Friday, February 05, 1999 10:47 AM
To: Unicode List
Subject: Re: UTF-8 in web pages

current versions of internet explorer, netscape, and lynx all support
unicode encodings.
unicode is _the_ html character set since version 3.2, i.e., all unicode
characters are supported by html. for example, (hexa)decimal numbers in
character entities are resolved as unicode code points.
the default charset is still iso 8859-1 - which is a subset of unicode,
code-point-wise.
i guess you know
        <meta http-equiv="Content-Type" Content="text/html; charset=utf-8">

the xml standard requires that clients are able to handle utf-8 and utf-16.

best regards,

markus

Markus Scherer IBM RTP +1 919 486 1135 Dept. Fax +1 919 254 6430
schererm@us.ibm.com
                        Unicode is here! --> http://www.unicode.org/

"John O'Conner" <joconner@geocities.com> on 99-02-05 12:15:33

To: Unicode List <unicode@unicode.org>
Subject: UTF-8 in web pages

I have a client that has a requirement to support several
languages on their website and e-commerce store. I want to
help them manage the storage of information and dynamic web
pages by suggesting a common character set for all
languages...Unicode.

It seems like a no-brainer to select Unicode for my database
character set because of their multi-language needs.
However, I'm concerned about Unicode in web pages. I have
browsed several UTF-8 pages with success, but I notice that
the industry hasn't really picked up on UTF-8 as an HTML
content encoding. Do any of you have any success/failure
stories that you can share? How comfortable would you be
recommending UTF-8 for HTML content. Oh, here's one more
piece of information...the customer has traditionally used
Big 5 for all their encoding needs. Actually...they've used
an extension for their special chars in Hong Kong that don't
seem to be available in Big 5.

Regards,
John O'Conner



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT