RE: Wordprocessors in Korean

From: Seuk Soo Sung (seuksoos@microsoft.com)
Date: Mon Jul 16 2001 - 09:03:32 EDT


First of all, sorry for long mail. If you are not interesting Korean
Wordprocessor, please stop here.

I already sent my answer to Prof Genenz, who was an original questioner
of this mail thread (but not included Unicode folks because it seems not
related with Unicode discussion). See below.
SeukSoo> : Wednesday, July 11, 2001 11:57 AM, Seuk Soo Sung wrote:
SeukSoo> Dear Professor Genenz,
SeukSoo> I am an employee of Microsoft and working at Korea.
SeukSoo> I am developing Korean version
SeukSoo> of Microsoft Office product including Microsoft Word. Just I'd
like to say
SeukSoo> couple of my comments about below your mail.
SeukSoo> 1. First of all, the teacher said that MS Word has some
shortcomings
SeukSoo> concerning Korean. Probably the teacher may misunderstand about
MS
SeukSoo> Word2K. I guess that she/he may think about supported
characters and
SeukSoo> table feature in MS Word2K against ARA97 (developed by H&C).
But, it's
SeukSoo> not true at all. MS Word2K support full modern Hangul character
set
SeukSoo> (11,172 chars) and almost 1.6M Old Hangul. Table feature was
much
SeukSoo> improved in Word2K with drawing pen. From our Focus Group
Interview or
SeukSoo> Survey, most of participants said that Word2K was more powerful
WP now
SeukSoo> (I agree that MS Word was not comparable with ARA almost 4
years ago).
SeukSoo> 2. The teacher said that there is a WP that is much more
frequently used
SeukSoo> than MS Word2K. She/He may point out ARA97. It's true in a
certain
SeukSoo> segments; School and Government segment. Both segments have
SeukSoo> somewhat strong characteristics to make more strong local
company.
SeukSoo> But, most business segments in Korea are using MS Word 2000
(more
SeukSoo> than 80%, I guess) because they should exchange some documents
with
SeukSoo> other local or foreign companies and Word2K supports
multilingual and
SeukSoo> good enough features. Most of students convert to MS Word from
ARA
SeukSoo> after graduation from school because of business purpose.
SeukSoo> Hope this to help you. If you have any question about Korean
version of
SeukSoo> MS Office, feel free to contact me.
SeukSoo> Thank you very much in advance.

Again, I'd like to add some history and technical information about
supported characters in MS Word.
1. Supported characters
Actually there are three types of Korean character sets that are
supported in MS Word (2000 and XP). The first one is 11,172 modern
Hangul characters (U+AC00 ~ U+D7A3) with 19 leading consonants, 21
vowels and 27 trailing consonants, and Hangul Jamo (U+1100 ~ U+11F9),
and CJK Hanja characters (U+3400 ~ U+9FA5). Second is Old Hangul which
was not used now but exist in old written documents since 1446 (Some
Korean are using the Old Hangul for special purpose yet, even though it
is not a standard right now in Korea). The number of Jamo in Old Hangul
are much more than modern Hangul Jamo. According to our research result
with National Language Research Institute, we defined 125 leading
consonants, 95 vowels, and 141 trailing consonants for Old Hangul (of
course, including modern Hangul because Old Hangul is proper superset of
modern Hangul). Theoretically, we can make 1,686,250 (LV type =125*95,
LVT type = 125*95*141) characters with these Jamo combination. MS Word
support all of them at all. The thrid one is GooGyeul which was used in
old written documents. GooGyeul glyps look like similar with Hanja
character but they were used in Korea with much different usage since
very long years ago. MS Word also implemented 255 GooGyeul characters
with cooperating National Language Research Institute and The Academy of
Korean Studies.

2. How implemented in MS Word
These characters were implemented in MS Word2000 and MS Word2002 with
different method as mentioned below by Chris Pratley.
A. MS Word2000
MS Word2000 used PUA area in Unicode to implement Old Hangul and
GooGyeul (actually code assignment for these characters) and MS
developed input tools that were ActiveX add-in of MS Word2000. Both font
files and input tools were a part of Korean version of MS Word2000
package, but not included in English version of MS Word2000 package.
Unfortunately, input tools were not supporting multilingual. This means
all user interfaces such as menu, dialogs were written by Hangul string.
Therefore, if you have a Korean version of MS Word2000, you can install
full package from PlusPack CD. But, if you are using English MS
Word2000, you need to install Multilingual pack with Korean UI first and
install related components; Old Hangul input tool, GooGyeul input tool
and some font files from Korean PlusPack CD. Please contact me, if you
need these tools and font files (mail to me directly).
B. MS Word2002
MS Word2002 implemented Old Hangul using Uniscribe engine. MS Word2002
also shipped Cicero input tool for Old Hangul input. Word2002 provided
new OpenType font files for Old Hangul glyp composition and used the
Jamos in U+1100 to compose Old Hangul characters directly using
Uniscribe engine. If you are using English Word2002 and Cicero input
tool, and want to use Old Hangul, then you just need to select
"Microsoft Korean Old Hangul input" in Cicero toolbar and install some
additional fonts that have OpenType data to handle a proper Jamo
combination with Uniscribe engine from Korean verison of MS Word2002
PlusPack CD. Please contact to me, if you want to have these fonts (mail
to me directly).
GooGyeul input tool is still ActiveX add-in and can be installed from
Korean PlusPack CD.

3. Compatibility with IE
Yes, we tested it on MS Windows2000 and WindowsXP with IE6.0. IE6.0 has
a capability to display OldHangul characters which was made by Word2000
and Word2002.

4. Consulting for Old Hangul and GooGyeul characters (Independent Korean
linguists)
While MS were developing Old Hangul and GooGyeul features, we got lots
of consulting and cooperation from government organization such as
National Language Research Institute and The Academy of Korean Studies,
and professors. They were reviewed our implementation and agreed them.

Thanks
SeuksooS

-----Original Message-----
From: Chris Pratley [mailto:chrispr@microsoft.com]
Sent: Sunday, July 15, 2001 2:57 AM
To: Jungshik Shin
Cc: Unicode Mailing List
Subject: RE: Wordprocessors in Korean

For Word2000 or Word2002, if you have the Korean retail package, there
is a CD included that has all the software you need. If you have another
version, such as English, I just checked and unfortunately this seems to
be an exception - I do not see this tool in the Proofing Tools kit or
Multilanguage pack, so you may need to contact Microsoft Korea if you
are interested. You should also try the "Tools on the Web" site for MS
Korea. There may be special packages there not available on the English
site.

The support used in Word2002 is done using Uniscribe (USP10.dll), an
updated version of which also ships in WindowsXP. You need a font that
has the right OpenType data to properly handle combining Jamos (included
with Korean OfficeXP). I believe that IE6.0 also uses Uniscribe in the
way that Word2002 does, but this needs to be verified.

Note that combination of Jamos is only supported for combinations that
are not available as pre-composed Hangul in Unicode already. Word (and
Office) use the existing Hangul for that.

As for the details of which linguists were consulted, how the
conclusions were reached, etc. you had best contact MS Korea. Not being
a Korean native speaker I do not really understand the complexities of
the arguments involved since they seem mainly philosophical and very
passionate.

BTW, do you have any details on what you described as limitations for
Word in Korean word processing?

Chris Pratley
Group Program Manager
Microsoft Word

Sent using OfficeXP

-----Original Message-----
From: Jungshik Shin [mailto:jshin@mailaps.org]
Sent: Friday, July 13, 2001 2:07 PM
To: Chris Pratley
Cc: Unicode Mailing List
Subject: RE: Wordprocessors in Korean

CP> : On Fri, 13 Jul 2001, Chris Pratley wrote:

JS> : On 2001-07-13, Jungshik Shin wrote:

JS> BTW, Microsoft (Korea) made a public annnouncement that it would
JS> support Middle Korean in the near future (in MS-Windows and MS-Word)
JS> and it would be great to get that support from one of major OS/word
JS> processor vendors.

CP> Actually, Word2000 and Word2002 support Old Hangul (which I think is
CP> what you refer to as Middle Korean - please correct me if I am
wrong).

  That's great to hear. Thank you for your info, wonderful
job and updating me on the issue. The 'near future' in the above was
relative to that announcement made by Microsoft and it's nice to know
that it's now 'present' :-)

  My assessment was based on my (not so extensive)
experience with using MS Word 2000. Because I can't find any way to
enter
Middle Korean, I thought it's not yet implemented as of MS Word 2000.

  Can I install an input method for Middle Korean (included in OfficeXP
Proofing tool) in MS Windows 2000 (or MS Windows ME)? Not so likely,
but I'm asking just in case.

CP> Word2000 does it using an add-in that uses the Unicode PUA to
support
CP> about 5000 Old Hangul pre-composed glyphs.

  Or, where can I get this add-in (supposedely input method and
font(s))?
I tried 'Office on the web' (office update and download area), but
couldn't find anything related to this.

CP> Word2002 uses the Jamos in U+1100 to compose Old Hangul character
CP> directly (over 1.3 million combinations theoretically, but due to
CP> independent Korean linguists' concerns, only valid combinations are
CP> allowed). You can now create any ancient text in Hangul in Word
2002. AN
CP> inoput method is included in the Korean version, as well as in the
CP> OfficeXP Proofing Tools kit (or Multilanguage pack, which includes
the
CP> Proofing Tools kit)

  Can you tell me who you meant by 'independent Korean linguists'?
How do you(or they) determine which are valid and which are not? Having
been found in existing literature?

  BTW, 1.3 milion seems to be off by a
factor of 2. Well, I'm being lazy here and just pulling the number top
off my head (instead of actually counting them) so that I might be
wrong.
Another BTW, are you aware of 5 new medial vowels submitted by DPRK to
add to U+1100 Jamo block? One of them is widely used by Korean speakers
in South Korea (and perhaps in North Korea as well).

CP> One thing you can not do easily is illustrate partially composed
CP> (invalid) characters for purposes of illustration in a textbook of
how
CP> characters are constructed.

  I don't understand why this can't be done if you're supporting
syllable composition using U+1100 Jamos? Due to lack of fonts? How
about
'incomplete syllables' (such as 'ICF' + 'medial vowel' + final consonant
where 'ICF' denotes initial consonant filler) or 'stand-alone' medial
vowels or final consonants?

  BTW, does MS IE 6.0 support Hangul syllable composition with Hangul
Jamos in U+1100 block? (see
<http://jshin.net/~jungshik/i18n/middle.html>)
Maybe, this has to be asked to somebody else in Microsoft, does it?

  Thank you again for great news,

   Jungshik Shin



This archive was generated by hypermail 2.1.2 : Mon Jul 16 2001 - 10:20:39 EDT