From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Nov 25 2004 - 14:26:47 CST
----- Original Message -----
From: Addison Phillips [wM]
To: pragati ; unicode@unicode.org
Sent: Thursday, November 25, 2004 6:21 PM
Subject: RE: Shift-JIS conversion.
Dear Pragati,
You can write your own conversion, of course. The mapping tables of
Unicode->SJIS are readily availably. You should note that there are several
vendor specific variations in the mapping tables. Notably Microsoft code
page 932, which is often called Shift-JIS, has more characters in its
character set than "standard" Shift-JIS (and it maps a few characters
differently too...)
> The important fact that you should be aware of: Shift-JIS is an encoding
> of the JIS X0208 character set.
> UTF-8 is an encoding of the Unicode character set.
More exactly, UTF-8 is an encoding of the ISO/IEC 10646 character set (the
character set here designates the set of characters, i.e. the repertoire
that describes characters with a name and a representative glyph and some
annotations, to which a numeric code is then assigned, the code point. The
char. set is
Unicode by itself is not a character set, only an implementation of the
ISO/IEC 10646 character set, in which which the Unicode standard assign
additional properties and behavior for characters allocated in ISO/IEC
10646. The link between Unicode and ISO/IEC 10646 is the assigned code point
and character name, which are now common between the two standards.
Of course the Unicode technical commitee may propose new assignments to
ISO/IEC, but this is still ISO/IEC 10646 which maintains the repertoire and
approves or rejects the proposals. A new character proposal may be rejected
by Unicode, but accepted by ISO/IEC 10646; and it is the ISO/IEC 10646 vote
that prevails (so Unicode will have to accept this ISO/IEC decision, even if
it has voted against it in a prior decision).
On the opposite, ISO/IEC 10646 says nothing about character properties or
behaviors. It can suggest, but the Unicode committee will make its own
decisions for the character properties and behavior that it chooses to
standardize. If Unicode wants to make its decisions widely accepted by all
users of the ISO/IEC 10646 repertoire, it's in the interest of Unicode of
trying to make these decisions in conformance with other existing national
or international standards, to maximize interoperability of national or
international applications based on the ISO/IEC 10646 character set.
This archive was generated by hypermail 2.1.5 : Thu Nov 25 2004 - 14:29:38 CST