Re: Is there a UTF that allows ISO 8859-1 (latin-1)?

From: Jungshik Shin (jungshik.shin@yale.edu)
Date: Fri Aug 21 1998 - 12:32:37 EDT


> Yung-Fong Tang wrote:
>
>
> >One of the reason you request this is because in your head, there are only one
> >important charset to you - ISO-8859-1. However, for my company, we care many
> >charset- ISO-8859-1, ISO-8859-2, ISO-8859-5., ISO-8859-7, ISO-8859-9, KOI8-R,
> >Shift_JIS, Big5, GB2312, ECU-KR, etc. If your request is reasonable, then I would
> >like to ask someone to design a UTF compatable with Big5 and GB2312, and
> >Shift_JIS, and KOI8_R. (just joking.)
>
> No, the only important character set for me is UCS.

  I think Frank has the point. When I saw your request for UCS-2
encoding compatible with ISO-8859-1, I couldn't help
regarding as (Western) Euro-centric.

> And currently I use only the
> first 256 codes of UCS as they are all I need, for the moment. Those codes happen
> to be the same as ISO 8859-1.

  Well, currently all my files are in EUC-KR encoding(for US-ASCII and
KS C 5601/KS X 1001) and most of programs I have can only handle
EUC-KR. Could that justify my requesting a UCS encoding compatible with
EUC-KR? Certainly not.

> To be able to allow other code values from UCS than the first 256, I need a way
> to add those without making all software I have to day obsolete and the new
> software must be able to read all existing texts.
> UTF-8 will not work unless it can read and write files compatible with what
> I have today.
> You who use non-latin character will also need something to mix old and new,
> but your character sets
> are not true subsets of UCS and cannot be handled as easily as ISO 8859-1.

  What do you mean by 'true' subset? KS C 5601/KS X 1001, JIS X 0208,
JIS X 0212, GB 2312, Plane 1 of CNS 11036? (used in various CJK
encodings such as EUC-JP, Shift_JIS, EUC-KR, EUC-CN, EUC-TW) are all
subsets of UCS-2.

> >encode Japanese, Korean, Chinese, and even Eastern European languages. That is THE
> >REASON why people proposed to have UTF-8. UTF-8 may not be the BEST choice we
> >could have, but ISO 8859-1 definitely is worst than it.
> >
> I doubt UTF-8 is the right choice for Chinese, UCS-2 would be better.
> And for transport
> between places, UTF-8 would be fine.

  Well, please think about why UTF-8 was initially called UTF-FSS (or
sth. like that).

> But most tools I have on my computer can only read 8-bit bytes and my files are in
> ISO 8859-1. As UTF-8 is not compatible with current usage on my system and I cannot
> expect software venders to fix my software any time soon, and new software using
> UTF-8 cannot read my old files, UTF-8 has not usage om my system.

  The same thing can be said of any one who has been using ISO-2022
based encodings (all ISO-8859-x family, EUC-CN, EUC-TW,EUC-KR,
EUC-JP, ISO-2022-{JP,KR,CN}) and other locale-dependent
encodings(Shift_JIS, Big5, JOHAB, KOI8-R?). Some gaps and
incompatiblity are inevitable when moving forward with UCS-{2,4} and
making UTF-8 compatible with ISO-8859-1(only one of a lot of character
sets covered by UCS-2) is not a way to go.

  If the only important character set to you is UCS, the best thing to
do is get,write and encourage/ask other to write programs that work
with UCS and its encodings, IMHO.

   Jungshik Shin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT