From: mpsuzuki@hiroshima-u.ac.jp
Date: Mon May 21 2007 - 04:50:40 CDT
Dear Gerrit,
On Sat, 19 May 2007 13:54:30 +0200
Gerrit Sangel <z0idberg@gmx.de> wrote:
>I just read some papers about the han unification
>but am a bit confused if there is something like
>a modifier control character.
>
>As far as I know, even though some radicals/characters
>are different in Chinese and Japanese, they were unified.
>
>But using a separate font in case the character is now
>Chinese, Japanese or Korean is not always possible
>(think of file names, mp3 tags, plain text files and so on),
so I wondered if there is something like a control
>character in proposal?
I guess what you want had ever been proposed as
"language tagging".
http://unicode.org/faq/languagetagging.html
http://www.unicode.org/reports/tr7/tr7-4.html
It was obsoleted, because the language specification in
plain Unicode text will conflict with higher level
language specifications in XML, HTML etc. ISO-2022
encoding may be better solution for such requiement.
>For example, using U+8336 茶 (I think, this character has
>different variants in Chinese and Japanese) and then
>append a control character to let the display program
>decide whether it should use a Chinese or Japanese glyph.
It was popular aspect that the specification of language
is sufficient to select appropriate glyph shape, when
glyph collection we focused was CJKV 5 column list in
ISO 10646 specification. But it was incorrect, I think.
>It seems, there is also a Variation Database
>http://www.unicode.org/reports/tr37/
>accepted, but as far as I understood it, it is not really
>a clearly defined way, e.g. that variation 2 of character x
>has always the same specific appearance.
Some people expects UTS37 as a database of unique glyph,
but it is not such. IVS is a registry of VS for ideographs
to avoid VS conflicts in interchange.
Before UTS37, system A can use U+xxxx E0100 for glyph A,
system B can use U+xxxx E0100 for glyph B. This is conflict.
After UTS37, system A can use U+xxxx E0100 for glyph A,
system B cannot use U+xxxx E0100 for glyph B.
Although IVD does not mention about the shape of glyph A,
the conflict was blocked.
>If there were a way to store the information about
>the variation of the character in the text itself,
>I think, it would be possible to create a font
>to include all CJK characters?
To include all CJK characters including glyphs for each
language, the number of glyph will be greater than 64k
(the size of CJK Unified Ideographs (inc. all Extensions)
is almost about 64k - if we collect non-unified variants
for CJKV, the number must be greater than a few times of
64k). They cannot be packed into single TrueType/OpenType
font which has limitation of 64k glyphs. You will have
to implement new font format of larger character collection
and rasterizers, text render etc etc. I guess it is not
what you want.
Regards,
mpsuzuki
This archive was generated by hypermail 2.1.5 : Mon May 21 2007 - 04:55:17 CDT