Re: Developing multilingual web sites

From: Aaron Delwiche (
Date: Tue Mar 21 2000 - 21:24:54 EST


I'd just like to thank Suzanne and everyone else on this list who took the
time to reply to my message.

Several people explained that it is difficult to preserve native code
formating when storing multiple languages in a single database. We will be
ultimately be using a content management system to drive the site, but my
question to the UNICODE list was motivated by the difficulties I faced in
producing a handful of static pages.

The real problems arose when I attempted to cut and paste text from one
application (Outlook Express or Microsoft Word) into another application
(Notepad, Homesite, or Adobe Photoshop). Korean and Japanese characters that
looked great in Unicode format immediately lost all of their formatting when
pasted into these other applications. I flirted with Windows 2000, Union Way
and RichWin, but nothing seemed to do the trick.

Nadine Kano's article "Multilingual Setup for Windows 2000 Professional" on contained the solution to my problem. In order
to cut and paste in native code format, it is necessary to change the
"default system locale" in the regional settings section of the control
panel. According to Ms. Kano:

"Windows 2000 can emulate this [local] environment, but it can only emulate
one such environment at a time. In the vernacular, we label this environment
the default ANSI code page or the default system locale. In either case, you
indicate which ANSI-based environment Windows can emulate during this part
of setup. The default system locale also determines which character tables
Windows uses to map strings between ANSI-based character encodings and
Unicode, for example strings that an application might send to Windows
through a wide-character application program interface."

In retrospect, this seems like such a simple solution. Once the default
system locale was set correctly, I was able to cut and paste from Unicode to
native code without any further difficulties. The only drawback is that I
need to reboot my computer each time in order to change the default system
locale. I think it is possible to circumvent this limitation by creating
multiple accounts with different language settings (a process described in
Kano's article).

Thanks once again to everyone on this list for your help. The problems that
we are encountering in multilingual computing environments are an exciting
indicator of how globalization is transforming our world. I'm eager to learn
more about these issues, and am heartened to have discovered a community of
individuals who are helping to make the Internet a truly global
communications tool.

Best regards,

Aaron Delwiche
Director of Interface Development

12/F Tin On Sing Commercial Bldg
41-43 Graham Street, Central, Hong Kong
tel (852) 2537-2313 fax (852) 2537-5678

----- Original Message -----
From: "Suzanne Topping" <>
To: "Unicode List" <>
Sent: Tuesday, March 21, 2000 11:04 PM
Subject: Re: Developing multilingual web sites

> Hi Aaron,
> It doesn't look like you received very many responses on list, so I'm
> to take a stab at it... I'm far from a web or Unicode expert, and so I
> that the learned folks out there will correct any misinformation, or add
> my response.
> > I am currently working on a multilingual web site that includes content
> > five different languages (English, Traditional Chinese, Simplified
> Chinese,
> > Korean, and Japanese). I am developing these pages with a text editor
> (e.g.,
> > UltraEdit and Homesite) and graphics applications such as Adobe
> > My operating system is Windows 2000 .
> >From your general description, I am assuming that you are not storing
> content in a database, is that correct?
> As someone pointed out to you, you'll need to make a decision about how
> want to store the data; as Unicode in a single file, or in multiple files
> using individual encodings. My understanding is that if you want to keep
> of the content in one file or database, you'll really need to use Unicode.
> don't believe you can store chunks of data within a single file/database
> multiple encodings. So that is issue number one; how are you going to
> structure and store data.
> > During the past week, I have received a batch of multilingual content
> > is mostly in Unicode format. I can easily view the content in Outlook
> > Express, and I can also view it Wordpad. However, when I export the
> to
> > text/HTML format, all of the unique character information is lost.
> I am assuming that this problem may have to do with the encoding settings
> the application that is doing the exporting, or the encoding settings in
> application which you later view the HTML. If the content is in Unicode,
> then all of these settings should also be set to Unicode.
> > I am searching for reliable tools that will export Unicode format to the
> > appropriate native code format (e.g. Big 5, Shift JIS).
> At what point do you plan to use these transforms? In the process that you
> have described, there is no mention of transforming the data. Again, if
> have all the data in a single file, there is probably not a good way to
> only the Chinese content to be transformed to Big 5, etc. How is your data
> segregated?
> If you do indeed want to serve data in native encodings, then you could
> consider using multiple URLs and serving each language as it's own site.
> (This would be the simplest solution.) Otherwise, you will have to find a
> way to dish up only the language that you need, transform it to the proper
> encoding, and can specify the native encoding method in a CSS.
> > 2. Can anyone point me to reliable software that will make it possible
> > create multilingual graphics with applications such as Adobe Photoshop
> > Macromedia Fireworks?
> Photoshop is useful because the designer can place the original text in a
> layer, and the localization process can then create a new layer with that
> text translated. You end up with a single Photoshop files containing all
> the languages, and you can simply turn layers on and off to get the
> that you need.
> > 3. Does anyone know of useful on-line articles that provide a detailed
> > explanation of the differences between Unicode and the native character
> > sets?
> I don't know about on-line articles, but Andrea Vine has a great article
> called "Demystifying Character Sets" in the August/September 1999 issue of
> MultiLingual Computing and Technology magazine (#26, Volume 10, Issue 4.)
> (
> > 4. What happens to native code text when it is copied and pasted to the
> > clipboard under Windows 2000? Does it preserve the native code format?
> Given Windows 2000 language support, one would think it would be
> and proper handling of it would be more dictated by the settings of the
> application into which you later paste it. But once again I'll say that
> no expert.
> The same issue of MultiLingual Computing and Technology contains an
> by Chris Pratley of Microsoft called "Taking Advantage of Office 2000". A
> section in that article describes features for "Plain Text Open & Save".
> primarily discussed Word, and said that the encoding method of a file will
> be detected when you try to open it, and you can then change settings to
> save it to the desired encoding method using the Save As feature. It
> seem to mention pasted text.
> Unfortunately, this doesn't really answer your question, but I'm hoping
> perhaps Chris will see this response and provide a better answer.
> Good luck with your work!
> --++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Suzanne Topping
> Localization Unlimited
> (Globalization Process Improvement Consulting and Training)
> In association with BizWonk (TM)
> Phone: 716-473-0791
> Fax: 716-231-2013
> Email:
> (Send me an email to join the North East Localization Special Interest
> Group, an email distribution list which acts as a discussion forum for
> localization issues.)

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT