Re: Developing multilingual web sites

From: Suzanne Topping (stopping@rochester.rr.com)
Date: Tue Mar 21 2000 - 10:09:51 EST


Hi Aaron,

It doesn't look like you received very many responses on list, so I'm going
to take a stab at it... I'm far from a web or Unicode expert, and so I hope
that the learned folks out there will correct any misinformation, or add to
my response.

> I am currently working on a multilingual web site that includes content in
> five different languages (English, Traditional Chinese, Simplified
Chinese,
> Korean, and Japanese). I am developing these pages with a text editor
(e.g.,
> UltraEdit and Homesite) and graphics applications such as Adobe Photoshop.
> My operating system is Windows 2000 .

From your general description, I am assuming that you are not storing
content in a database, is that correct?

As someone pointed out to you, you'll need to make a decision about how you
want to store the data; as Unicode in a single file, or in multiple files
using individual encodings. My understanding is that if you want to keep all
of the content in one file or database, you'll really need to use Unicode. I
don't believe you can store chunks of data within a single file/database in
multiple encodings. So that is issue number one; how are you going to
structure and store data.

> During the past week, I have received a batch of multilingual content that
> is mostly in Unicode format. I can easily view the content in Outlook
> Express, and I can also view it Wordpad. However, when I export the files
to
> text/HTML format, all of the unique character information is lost.

I am assuming that this problem may have to do with the encoding settings in
the application that is doing the exporting, or the encoding settings in the
application which you later view the HTML. If the content is in Unicode,
then all of these settings should also be set to Unicode.

> I am searching for reliable tools that will export Unicode format to the
> appropriate native code format (e.g. Big 5, Shift JIS).

At what point do you plan to use these transforms? In the process that you
have described, there is no mention of transforming the data. Again, if you
have all the data in a single file, there is probably not a good way to get
only the Chinese content to be transformed to Big 5, etc. How is your data
segregated?

If you do indeed want to serve data in native encodings, then you could
consider using multiple URLs and serving each language as it's own site.
(This would be the simplest solution.) Otherwise, you will have to find a
way to dish up only the language that you need, transform it to the proper
encoding, and can specify the native encoding method in a CSS.

> 2. Can anyone point me to reliable software that will make it possible to
> create multilingual graphics with applications such as Adobe Photoshop or
> Macromedia Fireworks?

Photoshop is useful because the designer can place the original text in a
layer, and the localization process can then create a new layer with that
text translated. You end up with a single Photoshop files containing all of
the languages, and you can simply turn layers on and off to get the versions
that you need.

> 3. Does anyone know of useful on-line articles that provide a detailed
> explanation of the differences between Unicode and the native character
> sets?

I don't know about on-line articles, but Andrea Vine has a great article
called "Demystifying Character Sets" in the August/September 1999 issue of
MultiLingual Computing and Technology magazine (#26, Volume 10, Issue 4.)
(www.multilingual.com)

> 4. What happens to native code text when it is copied and pasted to the
> clipboard under Windows 2000? Does it preserve the native code format?

Given Windows 2000 language support, one would think it would be preserved,
and proper handling of it would be more dictated by the settings of the
application into which you later paste it. But once again I'll say that I'm
no expert.

The same issue of MultiLingual Computing and Technology contains an article
by Chris Pratley of Microsoft called "Taking Advantage of Office 2000". A
section in that article describes features for "Plain Text Open & Save". It
primarily discussed Word, and said that the encoding method of a file will
be detected when you try to open it, and you can then change settings to
save it to the desired encoding method using the Save As feature. It didn't
seem to mention pasted text.

Unfortunately, this doesn't really answer your question, but I'm hoping
perhaps Chris will see this response and provide a better answer.

Good luck with your work!

--++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Suzanne Topping
Localization Unlimited
(Globalization Process Improvement Consulting and Training)

In association with BizWonk (TM)

Phone: 716-473-0791
Fax: 716-231-2013
Email: stopping@rochester.rr.com

(Send me an email to join the North East Localization Special Interest
Group, an email distribution list which acts as a discussion forum for
localization issues.)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT