Unicoders...
I am forwarding on behalf of Mary Ink, a post that got side-tracked
this morning.
        -- Sarasvati
>From: "mary ink" <maryink@hotmail.com>
To: unicode@unicode.org
Subject: Korean language support and other Far Eastern Questions
Date: Tue, 25 Apr 2000 19:26:04 GMT
Doing research, as you might infer, on how Unicode handles Korean in the 
technical sense and how Unicode has handled the Far Eastern languages in a 
political sense. Any facts or views welcome.
Why are some 11,171 places allocated to Hangul Syllables when the language
system is made up of only 19 consonants (ja-um) and 21 single and combined
vowels (mo-um)? If the syllables could be made up from their constituent
parts they wouldn't require double bytes, no?
Hangul letters are arranged in combinations of left to right and top to
bottom depending on the shape and orientation of the vowel. Each arrangement
of letters composes a syllable. So that syllables remain in proportion to
each other, the shape and size of the letters within the syllable are
modified. How do character display systems and coding standards such as
Unicode handle these non-linear letter combinations and relative changes in
letter shape?
How do the Hangul Compatibility Jamo characters 12592-12687 needed for
compatibility with KSC 5601 encoding relate to the Hangul Syllables
44032-55203?
Olle Jarnefors explained in "A short overview of ISO/IEC 10646 and Unicode"
http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html (1996) that
"ISO/IEC 10646 and Unicode removes some assumptions of the made about plain
text, which simplifies implementations but are untenable in multilingual
text and monolingual text in some languages Characters cannot be identified
with glyphs. Different graphic forms to be used in different situations are
needed for some characters, e.g. Arabic letters". I find these statements
impenetrable. Does he mean that Unicode considers the character
independently of its appearance and therefore is capable of handling text
elements that change in appearance relative to position as they do in some
languages including Korean? Or does he mean the opposite?
I understand that Unicode supports multidirectional text and overlapping or
composite characters. Can it then handle the special multidirectional and
composite character of the Hangul writing system?
How international is the Unicode consortium? Lists of member companies I`ve
seen are predominantly American. Has this had any bearing on how character
codes have been standardized?
How have national encoding standards been incorporated into Unicode and ISO
UCS standards?
Why and how was Unicode developed separately from ISO UCS standards and then
agreed upon later? How do the 2 standards differ?
There seems to be a privileging of ASCII characters in UTF-8 in that they
require fewer bits at the expense of "less common" characters. Has there
been any discussion about what seems to be an inequitable compromise?
As a letter-based system with relatively few discreet symbols, Hangul is
very easy to input using a keyboard. I understand, however, that the Han
ideographs depend on clunky composition methods using the keyboard to make
sound approximations of the character in question, which in turn display a
choice of characters to select from (source: Elliotte Rusty Harold's XML
Bible). Are there alternative ways to compose these characters with a
keyboard based on root forms and strokes, the way they are listed in
ideograph dictionaries? I realise this is complicated but there must be a
way around the problem. Does the Unicode consortium concern itself with
input technology?
I have read that Japanese programmers initially criticized UCS
standardization. I also recall a great deal of furor in Korea over their
homegrown word processing software, HWP, being buried by MS Word. What if
any resistance has there been to Unicode and ISO UCS standardization,
particularly among the language groups it is intended to better serve, and
how has this been resolved? Has there been national resistance to the
concept of "unified Han characters"? The character differences seem subtle
when considered scientifically but surely unifying the language codes of
such antagonistic groups as Japan, North Korea, South Korea, China and
Taiwan has been politically volatile.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT