RE: Shift-JIS encoded text (was: RE: Tags and future new technologies [...]) from Doug Ewell on 2012-06-01 (Unicode Mail List Archive)

From: Doug Ewell <doug_at_ewellic.org>
Date: Fri, 01 Jun 2012 15:22:34 -0700

I hadn't thought that Peter was talking about "text encoded according to
the Shift-JIS model," without specifying the encoding. I'm not sure that
changes my question.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell 
 
-------- Original Message --------
Subject: Re: Shift-JIS encoded text (was: RE: Tags and future new
technologies [...])
From: Ken Whistler <kenw_at_sybase.com>
Date: Fri, June 01, 2012 3:17 pm
To: unicode_at_unicode.org
On 6/1/2012 1:51 PM, Doug Ewell wrote:
> At what point does text
> encoded in a vendor's private-use extension to Shift-JIS become
> "Shift-JIS encoded text"?
A possibly less confusing way to put this is:
At what point does text encoded in a vendor's private-use extension
to *JIS X 0208* become "Shift-JIS encoded text"?
The reason for putting it that way is that JIS X 0208 is a character
encoding standard. It defines the repertoire of characters and
assigns numbers to them.
But 2022-JP, EUC-JP, and Shift-JIS are then 3 different ways of
turning JIS X 0208 character codes (and possibly vendor or other
extensions) into streams of bytes. Think of them as character encoding
schemes (in the Unicode character encoding model sense).
One of the reasons why there are "many Shift-JIS's" is not that the
principle of how to shift JIS X 0208 code values into bytes changes,
but because there are many different private extensions, all making
use of the same general principle for how to move the byte values
into a particular scheme for processing.
In summary, "Shift-JIS" is not a character encoding standard -- it is
a scheme for turning JIS (and various extensions) into a particular
format for processing.
--Ken

Received on Fri Jun 01 2012 - 17:24:12 CDT

This archive was generated by hypermail 2.2.0 : Fri Jun 01 2012 - 17:24:13 CDT