L2/01-193

 

From: Asmus Freytag [asmusf@ix.netcom.com]

Sent: Thursday, May 03, 2001 12:22 AM

 

Subject: Mapping Table issues (see L2/001-192)

 

T. Kubota recently submitted a problem report on East Asian Mappings which

now is document L2/001-192. I have had a private communication with him

surrounding his submission and he made a number of additional comments

which I would like to pass on. I've removed details already covered

elsewhere (L2/001-179 and L2/001-189) but left in some of my replies to him.

 

Please treat this simply as additional background for our discussion at the

UTC.

 

A./

 

====================================================================

 

>At 11:54 AM 5/2/01 +0900, Tomohiro KUBOTA wrote:

>>

>>I thought that "fullwidth and halfwidth forms" should not be

>>used unless normal version is already used for other codepoints.

>

>I think this was our starting point, but then, this caused some

>problems with some vendor sets that have both narrow forms AND

>the wide forms for POUND, CENT, NOT SIGN, etc. With mapping to

>Fullwidth forms, all Japanese sets, whether 'pure' JIS or, vendor supersets

>of JIS can map the same character to the same Unicode character.

>

>We probably need to explain this more.

>

>>Anyway, I hope that Unicode Consortium takes a solution which

>>does not bring large confusion.  (I am afraid that changing

>>conversion table might confuse users.)   However, if Unicode

>>Consortium can take an initiative and major vendors (like

>>Microsoft, Apple, and Sun) will follow it, it will be OK.

>

>Some vendors whose mappings I was able to check already agree with this.

>

>>In short, any way will be OK.  I think it is important that

>>Unicode Consortium takes an initiative and avoid confusion.

>>I guess there are some Japanese people who know needs of

>>average Japanese Windows/Macintosh/Linux/... users in Unicode

>>Consortium.  I hope this problem will be discussed with them.

>>

 

On adding X0212 to the list of encodings on which EAW is based:

 

>>Though it is true that JIS X 0212 is not very popular,

>>I don't think there are any positive reason not to support

>>JIS X 0212.  Mule and Emacs are samples of implementation.

>

>Adding X0212 into the EAW pool of legacy encodings adds a large

>number of characters to class "A" and makes it harder to get

>context information to decide whether to treat a character as

>wide or narrow. In particular, it's not so much a question of

>whether *some part* of X0212 is supported, but whether these

>European characters are used as wide characters by a large

>enough group of users to reflect it in the EAW tables.

>

>> > The next one is almost correct, it should be Na, if it

>> > is used to map a non-wide character in an EA legacy encoding.

>> >

>> > FILE SHIFTJIS.TXT------

>> > 0x7E  U+203E  N  # OVERLINE

>>

>>Yes, if U+203E is not used as a doublewidth character in any

>>other conversion tables, it should be "Na".

>>

>> > FILE BIG5.TXT------

>> > 0xA145  U+2022  N  # BULLET

>> >

>> > If A14E is not in fact a half-width character in

>> > big 5 then what is this supposed to map to?

>> >

>> > 0xA14E  U+FF64  H  # HALFWIDTH IDEOGRAPHIC COMMA

>>

>>Sorry I have no idea.  Please ask someone who speaks

>>traditional Chinese.  I tested some Chinese-enabled

>>terminals (cxterm and rxvt) and found the character

>>is displayed in doublewidth.

 

and he finishes:

 

>>I hope Unicode Consortium takes an initiative to solve this problem.

>>If Unicode Consortium can really do this work, please consider solving

>>"Conversion tables differ between venders" problem written in my page.

>>http://www.debian.or.jp/~kubota/unicode-symbols.html .

>>Japanese people are unhappy with the situation that same JIS X 0208

>>characters are mapped into different Unicode characters depending on

>>vendors.  However, I imagine this situation comes from political

>>horse-trading of major vendors and Japanese people are located at

>>hopeless situation...  (For example, I imagine Microsoft and Sun

>>will never agree to use common conversion table.)  Can Unicode Consortium

>>take an initiative to use a common consistent conversion table?

>>

>>

>>And, please consider "EUC-JP roundtrip compatibility" problem.

>>This problem can automatically solved if

>>

>> > FILE JIS0208.TXT------

>> > 0x2140  U+005C  Na  # REVERSE SOLIDUS

>>

>>is regarded as a mapping table problem and changed to use corresponding

>>fullwidth form, though I once received a mail like

>>

>> >> However, such a table does not guarantee round-trip conversion.

>> >> This is because JIS0802.TXT converts 0x2140 (0xa1 0xc0 in EUC-JP)

>> >> in JISX0208 into U+005C while 0x5c in EUC-JP must be mapped into

>> >> U+005C.  In short, U+005C corresponds to two characters

>> >> (0x5c and 0xa1 0xc0) in EUC-JP.

>> >

>> > This is a known problem, and is very unfortunate.  We don't have an

>> > official way around this problem.  I suggest that you might ask some

>> people

>> > on the Unicode mail list and see if other people have tables or code

>> that

>> > helps fix this problem.  Please see:

>> >       http://www.unicode.org/unicode/consortium/distlist.html

>>

>>from Rick McGowan <rick@unicode.org> when I pointed out this

>>round-trip compatibility problem to info@unicode.org .

>

>This will all be addressed at the next UTC meeting. I won't promise you that

>you will like the final answer, since I don't know what it will be, but at

>the minimum we are going to look at the problem and decide to do the best

>with the resources we have.

>

>A./