I hadn't thought that Peter was talking about "text encoded according to
the Shift-JIS model," without specifying the encoding. I'm not sure that
changes my question.
-- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell -------- Original Message -------- Subject: Re: Shift-JIS encoded text (was: RE: Tags and future new technologies [...]) From: Ken Whistler <kenw_at_sybase.com> Date: Fri, June 01, 2012 3:17 pm To: unicode_at_unicode.org On 6/1/2012 1:51 PM, Doug Ewell wrote: > At what point does text > encoded in a vendor's private-use extension to Shift-JIS become > "Shift-JIS encoded text"? A possibly less confusing way to put this is: At what point does text encoded in a vendor's private-use extension to *JIS X 0208* become "Shift-JIS encoded text"? The reason for putting it that way is that JIS X 0208 is a character encoding standard. It defines the repertoire of characters and assigns numbers to them. But 2022-JP, EUC-JP, and Shift-JIS are then 3 different ways of turning JIS X 0208 character codes (and possibly vendor or other extensions) into streams of bytes. Think of them as character encoding schemes (in the Unicode character encoding model sense). One of the reasons why there are "many Shift-JIS's" is not that the principle of how to shift JIS X 0208 code values into bytes changes, but because there are many different private extensions, all making use of the same general principle for how to move the byte values into a particular scheme for processing. In summary, "Shift-JIS" is not a character encoding standard -- it is a scheme for turning JIS (and various extensions) into a particular format for processing. --KenReceived on Fri Jun 01 2012 - 17:24:12 CDT
This archive was generated by hypermail 2.2.0 : Fri Jun 01 2012 - 17:24:13 CDT