Re: Big5 box-drawing characters missing from Unicode?

From: Doug Ewell <doug_at_ewellic.org>
Date: Sat, 5 May 2012 16:02:32 -0600

Řistein E. Andersen wrote:

> The attached picture shows 16 box-drawing characters from Big5 (glyphs
> scanned from Lunde's CJKV Information Processing, 1st Edn).

The glyph differences between the top and bottom ("prime") rows in
Řistein's attachment aren't reflected in Microsoft's online charts:

http://msdn.microsoft.com/en-us/goglobal/gg663885
http://msdn.microsoft.com/en-us/goglobal/gg670206

> The characters in the first line are found in Row A2, which is part of
> the original Big5:
>
> a) A2-7E: U+256D (BOX DRAWINGS LIGHT ARC DOWN AND RIGHT)
> b) A2-A1: U+256E (BOX DRAWINGS LIGHT ARC DOWN AND LEFT)
> c) A2-A2: U+2570 (BOX DRAWINGS LIGHT ARC UP AND RIGHT)
> d) A2-A3: U+256F (BOX DRAWINGS LIGHT ARC UP AND LEFT)
>
> e') A2-A4
> f') A2-A5
> g') A2-A6
> h') A2-A7

Microsoft's charts and the CP950 mapping file on the Unicode site:

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT

and the Big5 file on the Unicode site (marked "obsolete" and with
several comments that Big5 mapping to Unicode is problematic):

http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT

all show the following mappings:

0xA2A4 0x2550 #BOX DRAWINGS DOUBLE HORIZONTAL
0xA2A5 0x255E #BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0xA2A6 0x256A #BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0xA2A7 0x2561 #BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE

> The characters in the second line are found in Row F9, which is part
> of an E-Ten extension to Big5 (included in many Big5 implementations
> including Microsoft's Big5 variant (Code Page 950) and the official
> Hong Kong standard Big5-HKSCS):
>
> a') F9-FA
> b') F9-FB
> c') F9-FC
> d') F9-FD
>
> e) F9-F9: U+2550 (BOX DRAWINGS DOUBLE HORIZONTAL)
> f) F9-E9: U+255E (BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE)
> g) F9-EA: U+256A (BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE)
> h) F9-EB: U+2561 (BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE)

Microsoft's charts and the CP950 file both show:

0xF9FA 0x256D #BOX DRAWINGS LIGHT ARC DOWN AND RIGHT
0xF9FB 0x256E #BOX DRAWINGS LIGHT ARC DOWN AND LEFT
0xF9FC 0x2570 #BOX DRAWINGS LIGHT ARC UP AND RIGHT
0xF9FD 0x256F #BOX DRAWINGS LIGHT ARC UP AND LEFT

The Big5 file has no entries for these code points in Big5, which is
consistent with Řistein's comment that these are extensions to the
original Big5.

In other words, the characters in the top and bottom rows are unified in
Unicode, according to both the Microsoft-provided mappings for CP950 and
(for the four listed code points) the obsolete Unicode mapping for Big5.
One would probably need to provide a second source to show that these
glyphs really have the indicated differences in appearance, plus
evidence that they are used in contrast to one another, to make the case
for either disunification or variation sequences.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­ 
Received on Sat May 05 2012 - 17:05:20 CDT

This archive was generated by hypermail 2.2.0 : Sat May 05 2012 - 17:05:21 CDT