L2/01-193
From:
Asmus Freytag [asmusf@ix.netcom.com]
Sent:
Thursday, May 03, 2001 12:22 AM
Subject: Mapping Table issues (see
L2/001-192)
T.
Kubota recently submitted a problem report on East Asian Mappings which
now is
document L2/001-192. I have had a private communication with him
surrounding
his submission and he made a number of additional comments
which I
would like to pass on. I've removed details already covered
elsewhere
(L2/001-179 and L2/001-189) but left in some of my replies to him.
Please
treat this simply as additional background for our discussion at the
UTC.
A./
====================================================================
>At
11:54 AM 5/2/01 +0900, Tomohiro KUBOTA wrote:
>>
>>I
thought that "fullwidth and halfwidth forms" should not be
>>used
unless normal version is already used for other codepoints.
>
>I
think this was our starting point, but then, this caused some
>problems
with some vendor sets that have both narrow forms AND
>the
wide forms for POUND, CENT, NOT SIGN, etc. With mapping to
>Fullwidth
forms, all Japanese sets, whether 'pure' JIS or, vendor supersets
>of
JIS can map the same character to the same Unicode character.
>
>We
probably need to explain this more.
>
>>Anyway,
I hope that Unicode Consortium takes a solution which
>>does
not bring large confusion. (I am afraid
that changing
>>conversion
table might confuse users.) However,
if Unicode
>>Consortium
can take an initiative and major vendors (like
>>Microsoft,
Apple, and Sun) will follow it, it will be OK.
>
>Some
vendors whose mappings I was able to check already agree with this.
>
>>In
short, any way will be OK. I think it
is important that
>>Unicode
Consortium takes an initiative and avoid confusion.
>>I
guess there are some Japanese people who know needs of
>>average
Japanese Windows/Macintosh/Linux/... users in Unicode
>>Consortium. I hope this problem will be discussed with
them.
>>
On
adding X0212 to the list of encodings on which EAW is based:
>>Though
it is true that JIS X 0212 is not very popular,
>>I
don't think there are any positive reason not to support
>>JIS
X 0212. Mule and Emacs are samples of
implementation.
>
>Adding
X0212 into the EAW pool of legacy encodings adds a large
>number
of characters to class "A" and makes it harder to get
>context
information to decide whether to treat a character as
>wide
or narrow. In particular, it's not so much a question of
>whether
*some part* of X0212 is supported, but whether these
>European
characters are used as wide characters by a large
>enough
group of users to reflect it in the EAW tables.
>
>>
> The next one is almost correct, it should be Na, if it
>>
> is used to map a non-wide character in an EA legacy encoding.
>>
>
>>
> FILE SHIFTJIS.TXT------
>>
> 0x7E U+203E N #
OVERLINE
>>
>>Yes,
if U+203E is not used as a doublewidth character in any
>>other
conversion tables, it should be "Na".
>>
>>
> FILE BIG5.TXT------
>>
> 0xA145 U+2022 N #
BULLET
>>
>
>>
> If A14E is not in fact a half-width character in
>>
> big 5 then what is this supposed to map to?
>>
>
>>
> 0xA14E U+FF64 H #
HALFWIDTH IDEOGRAPHIC COMMA
>>
>>Sorry
I have no idea. Please ask someone who
speaks
>>traditional
Chinese. I tested some Chinese-enabled
>>terminals
(cxterm and rxvt) and found the character
>>is
displayed in doublewidth.
and he
finishes:
>>I
hope Unicode Consortium takes an initiative to solve this problem.
>>If
Unicode Consortium can really do this work, please consider solving
>>"Conversion
tables differ between venders" problem written in my page.
>>http://www.debian.or.jp/~kubota/unicode-symbols.html
.
>>Japanese
people are unhappy with the situation that same JIS X 0208
>>characters
are mapped into different Unicode characters depending on
>>vendors. However, I imagine this situation comes from
political
>>horse-trading
of major vendors and Japanese people are located at
>>hopeless
situation... (For example, I imagine
Microsoft and Sun
>>will
never agree to use common conversion table.)
Can Unicode Consortium
>>take
an initiative to use a common consistent conversion table?
>>
>>
>>And,
please consider "EUC-JP roundtrip compatibility" problem.
>>This
problem can automatically solved if
>>
>>
> FILE JIS0208.TXT------
>>
> 0x2140 U+005C Na #
REVERSE SOLIDUS
>>
>>is
regarded as a mapping table problem and changed to use corresponding
>>fullwidth
form, though I once received a mail like
>>
>>
>> However, such a table does not guarantee round-trip conversion.
>>
>> This is because JIS0802.TXT converts 0x2140 (0xa1 0xc0 in EUC-JP)
>>
>> in JISX0208 into U+005C while 0x5c in EUC-JP must be mapped into
>>
>> U+005C. In short, U+005C
corresponds to two characters
>>
>> (0x5c and 0xa1 0xc0) in EUC-JP.
>>
>
>>
> This is a known problem, and is very unfortunate. We don't have an
>>
> official way around this problem.
I suggest that you might ask some
>>
people
>>
> on the Unicode mail list and see if other people have tables or code
>>
that
>>
> helps fix this problem. Please
see:
>>
> http://www.unicode.org/unicode/consortium/distlist.html
>>
>>from
Rick McGowan <rick@unicode.org> when I pointed out this
>>round-trip
compatibility problem to info@unicode.org .
>
>This
will all be addressed at the next UTC meeting. I won't promise you that
>you
will like the final answer, since I don't know what it will be, but at
>the
minimum we are going to look at the problem and decide to do the best
>with
the resources we have.
>
>A./