Re: Fun with UDCs in Shift-JIS

From: Thomas Chan (thomas@atlas.datexx.com)
Date: Thu Jan 17 2002 - 13:51:07 EST


On Thu, 17 Jan 2002, Markus Scherer wrote:

> Lars Marius Garshol wrote:
> > I've just discovered that it seems that Shift-JIS encodes a number of
> > User-Defined Characters in the 0xF040 to 0xF9FC range, and that these
>
> Yes, and every implementor may assign characters to them as they see fit.

Besides CP932 use of Shift-JIS UDC codepoints, one might also want to
consider (used separately or in conjunction with CP932 UDC usage):
  - encodings of JIS X 0213 in Shift-JIS using UDC codepoints
  - NTT-DoCoMo pictographs[1] in webpages for cell phones

[1] http://www.nttdocomo.co.jp/tag/emoji/
    (Shift-JIS 0xF89F to 0xF971)

> > characters are used in web pages. Does anyone know of a source of
>
> The problem being that most likely they are all tagged as
> charset="Shift_JIS", without distinguishing the variant of what's in
> the Shift-JIS encoding. Unreliable tagging is very common. That's one
> good reason why we all advocate Unicode...

But the problem of not knowing how the PUA is being used (analogous to
UDCs in legacy encodings) still exists, although there should be less need
for it compared to smaller character sets.
 
 
> The W3C has a page about the problems with Japanese charset
> identifiers and mapping tables.

URL?

Thomas Chan
tc31@cornell.edu



This archive was generated by hypermail 2.1.2 : Thu Jan 17 2002 - 13:29:18 EST