From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jul 21 2006 - 14:19:19 CDT
Mick Hall asked:
> My first question is that while UTF-8 encoding seems to be working fine
> for all languages at the moment, am I heading for trouble with CJK
> languages in particular?
No.
> Is Unicode really viable for websites in CJK
> languages?
Yes.
> Also, we're interested in search engines picking up and indexing the
> text. Particularly Google and Baidu. Is UTF-8 a good choice for this?
Yes. Particularly if your pages are all clearly labelled by charset
and language.
>
> One final question if I may. Does anyone know whether search engines
> make any sense out of text encoded as character entities?
They had better, or else they are processing HTML nonconformantly.
Try it. I just did a google search of "Tällöin kustannusten" and turned up
all kinds of Finnish pages -- some defaulting to 8859-1, some
explicitly labelled 8859-1, some explicitly labelled UTF-8.
Most of the 8859-1 pages simply use 8859-1 characters, but this one:
http://www.mandinka.org/Public/FI/
labelled 8859-1, uses numeric entities for all non-ASCII
characters.
It works fine, I think.
--Ken
This archive was generated by hypermail 2.1.5 : Fri Jul 21 2006 - 14:26:54 CDT