Re: Fun with UDCs in Shift-JIS

From: David Hopwood (david.hopwood@zetnet.co.uk)
Date: Wed Jan 16 2002 - 16:31:28 EST


-----BEGIN PGP SIGNED MESSAGE-----

Lars Marius Garshol wrote:
> I've just discovered that it seems that Shift-JIS encodes a number of
> User-Defined Characters in the 0xF040 to 0xFCFC range, and that these
> characters are used in web pages. Does anyone know of a source of
> mappings for these characters, or even have information about what
> kinds of characters are found in this area?

Presumably the "NT 4.0" mapping at <http://www.autumn.org/etc/unidif.html>
(in Japanese, but the table is readable by non-Japanese-speakers).

That mapping is a superset of CP932
(<ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT>),
with additional mappings from 0xF040..0xF9FC to U+E000..E757, from
0x80 to U+0080 (why?), and from the other 4 reserved single-byte codes
to U+F8F0..F8F3.

> Google searches found a number of mentions of this, and even one
> mapping, but none of them seemed to be usable.
>
> Also, does anyone know of a Shift-JIS web page that uses one of these
> characters?

I wouldn't know, but the private use codes can't be assumed to mean
anything in particular, regardless of what charset they start out as.
Such pages are broken, and should be using NCRs or images instead.

- --
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBPEXxCjkCAxeYt5gVAQGpowf/dHgm8YGW42N6NKvGtqkOqrfzI6Y3hX6C
U1bxpNbzChOM0UzstYgl4dW1akbb0G4F67iTKTGNIAvJujJeD/7cG4Ld0jbuNiQL
iNn9dfK3e2FeXyETpmIp8iAMb9V5cq8Po07WbUA6XoeV9ygyKHwcnNjEatxclJY/
TsFFlx9cVt9Y+lzNOzRSeZSzGA7MkjkrTsnj2D7myBdpx5uRLeiH43/FhY6iX1jM
FiYus30d5uwitG4QXwH2yhcgI3xIfAD/Snd40eUkRmhmex4YUkkQXNHSR3WjDpD6
XGo/TQFY1r1ZLIAEE79MKf4jKnioL8C3H9s78gAyN5hBWSrg7W8YKQ==
=8+4A
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2.1.2 : Thu Jan 17 2002 - 15:27:57 EST