> Victor Tse wrote:
> 
>> On Windows, there are cp1252, cp1250, cp1251 and etc. On UNIX, there are
>> 8859-1,9.
>> I know that cp1252 is corresponds to 8859-1. Are they exactly the same
>> code point by code point?
>No, CP1252 is a superset of 8859-1. In 1252, the "C1" range (0x80-9f)
>contains "graphic" characters, while 8859-1's C1 is only control
>characters. Other than the C1 range, 1252 and 8859-1 are the same (as
>far as I know).
CHARSET-NAME=ISO 8859-1 (Latin-1, Western Europe)
CHARSET-NAME-GERMAN=ISO 8859-1 (Lateinisch 1, Westeuropa)
CODEPAGE-NUMBER=819
EXPLANATION=Suited for (at least) Danish, Dutch, English, Faeroese,
EXPLANATION=Finnish, French, German, Icelandic, Irish, Italian,
EXPLANATION=Norwegian, Portuguese, Spanish and Swedish.
#
# Characters 20-7F are identical to ASCII (ISO 646)
# Characters 80-9F are unassigned
# Characters A0-FF are identical to the Unicode characters.
UNICODE-MAP=
#      0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
#    ==============================================================================
A0:   A0   A1   A2   A3   A4   A5   A6   A7   A8   A9   AA   AB   AC   AD   AE   AF
B0:   B0   B1   B2   B3   B4   B5   B6   B7   B8   B9   BA   BB   BC   BD   BE   BF
C0:   C0   C1   C2   C3   C4   C5   C6   C7   C8   C9   CA   CB   CC   CD   CE   CF
D0:   D0   D1   D2   D3   D4   D5   D6   D7   D8   D9   DA   DB   DC   DD   DE   DF
E0:   E0   E1   E2   E3   E4   E5   E6   E7   E8   E9   EA   EB   EC   ED   EE   EF
F0:   F0   F1   F2   F3   F4   F5   F6   F7   F8   F9   FA   FB   FC   FD   FE   FF
CHARSET-NAME=MS Windows Codepage 1252 (ANSI)
CHARSET-NAME-GERMAN=MS Windows Codeseite 1252 (ANSI)
CODEPAGE-NUMBER=1252
EXPLANATION=Same as ISO 8859-1, except quotes etc. that have been added
EXPLANATION=in the range 80-9F which is unused in ISO 8859-1
#
# Characters 20-7F are identical to ASCII (ISO 646)
# Characters 80-9F contain quotes and other special characters
# Characters A0-FF are identical to the Unicode or ISO 8859-1 characters.
UNICODE-MAP=
#      0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
#    ==============================================================================
80:    *    * 201A  192 201E 2026 2020 2021  2C6 2030  160 2039  152    *    *    *
90:    * 2018 2019 201C 201D 2219 2013 2014  2DC 2122  161 203A  153    *    *  178
A0:   A0   A1   A2   A3   A4   A5   A6   A7   A8   A9   AA   AB   AC   AD   AE   AF
B0:   B0   B1   B2   B3   B4   B5   B6   B7   B8   B9   BA   BB   BC   BD   BE   BF
C0:   C0   C1   C2   C3   C4   C5   C6   C7   C8   C9   CA   CB   CC   CD   CE   CF
D0:   D0   D1   D2   D3   D4   D5   D6   D7   D8   D9   DA   DB   DC   DD   DE   DF
E0:   E0   E1   E2   E3   E4   E5   E6   E7   E8   E9   EA   EB   EC   ED   EE   EF
F0:   F0   F1   F2   F3   F4   F5   F6   F7   F8   F9   FA   FB   FC   FD   FE   FF
>> What about the other? Can you tell me their relationship?
>1250 -- 8859-2
>1253 -- 8859-7
>1254 -- 8859-9
You must be kidding, there is a hell of a difference between 1250 and 8859-2 ! 
Probably between the other as well
CHARSET-NAME=ISO 8859-2 (Latin-2, Eastern Europe)
CHARSET-NAME-GERMAN=ISO 8859-2 (Lateinisch 2, Osteuropa)
CODEPAGE-NUMBER=912
EXPLANATION=Suited for (at least) Albanian, Czech, Hungarian, Polish,
EXPLANATION=Rumanian, (Serbo-)Croatian, Slovak and Slovene.
#
# Characters 20-7F are identical to ASCII (ISO 646)
# Characters 80-9F are unassigned
UNICODE-MAP=
#      0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
#    ==============================================================================
A0:   A0  104  2D8  141   A4  13D  15A   A7   A8  160  15E  164  179   AD  17D  17B
B0:   B0  105  2DB  142   B4  13E  15B  2C7   B8  161  15F  165  17A  2DD  17E  17C
C0:  154   C1   C2  102   C4  139  106   C7  10C   C9  118   CB  11A   CD   CE  10E
D0:  110  143  147   D3   D4  150   D6   D7  158  16E   DA  170   DC   DD  162   DF
E0:  155   E1   E2  103   E4  13A  107   E7  10D   E9  119   EB  11B   ED   EE  10F
F0:  111  144  148   F3   F4  151   F6   F7  159  16F   FA  171   FC   FD  163  2D9
CHARSET-NAME=MS Windows Codepage 1250 (Eastern Europe)
CHARSET-NAME-GERMAN=MS Windows Codeseite 1250 (Osteuropa)
CODEPAGE-NUMBER=1250
EXPLANATION=looks like modified Version of ISO 8859-2
#
# Characters 20-7F are identical to ASCII (ISO 646)
UNICODE-MAP=
#      0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
#    ==============================================================================
80:    *    * 201A    * 201E 2026 2020 2021    * 2030  160 2039  15A  164  17D  179
90:    * 2018 2019 201C 201D 2219 2013 2014    * 2122  161 203A  15B  165  17E  17A
A0:   A0  2C7  2D8  141   A4  104   A6   A7   A8   A9  15E   AB   AC   AD   AE  17B
B0:   B0   B1  2DB  142   B4   B5   B6   B7   B8  105  15F   BB  13E  2DD  13D  17C
C0:  154   C1   C2  102   C4  139  106   C7  10C   C9  118   CB  11A   CD   CE  10E
D0:  110  143  147   D3   D4  150   D6   D7  158  16E   DA  170   DC   DD  162   DF
E0:  155   E1   E2  103   E4  13A  107   E7  10D   E9  119   EB  11B   ED   EE  10F
F0:  111  144  148   F3   F4  151   F6   F7  159  16F   FA  171   FC   FD  163  2D9
>and so on. Look at the Unicode book, or Unicode Web site.
CHARSET-NAME=ISO 8859-5 (Latin/Cyrillic)
CHARSET-NAME-GERMAN=ISO 8859-5 (Lateinisch/Kyrillisch)
CODEPAGE-NUMBER=915
EXPLANATION=Suited for Bulgarian, Bielorussian, English, Macedonian,
EXPLANATION=Russian, Serbocroatian and Ukrainian.
#
# Characters 20-7F are identical to ASCII (ISO 646)
# Characters 80-9F are unassigned
UNICODE-MAP=
#      0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
#    ==============================================================================
A0:   A0  401  402  403  404  405  406  407  408  409  40A  40B  40C   AD  40E  40F
B0:  410  411  412  413  414  415  416  417  418  419  41A  41B  41C  41D  41E  41F
C0:  420  421  422  423  424  425  426  427  428  429  42A  42B  42C  42D  42E  42F
D0:  430  431  432  433  434  435  436  437  438  439  43A  43B  43C  43D  43E  43F
E0:  440  441  442  443  444  445  446  447  448  449  44A  44B  44C  44D  44E  44F
F0: 2116  451  452  453  454  455  456  457  458  459  45A  45B  45C   A7  45E  45F
CHARSET-NAME=MS Windows Codepage 1251 (Cyrillic)
CHARSET-NAME-GERMAN=MS Windows Codeseite 1251 (Kyrillisch)
CODEPAGE-NUMBER=1251
#
# Characters 20-7F are identical to ASCII (ISO 646)
UNICODE-MAP=
#      0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
#    ==============================================================================
80:  402  403 201A  453 201E 2026 2020 2021    * 2030  409 2039  40A  40C  40B  40F
90:  452 2018 2019 201C 201D 2219 2013 2014    * 2122  459 203A  45A  45C  45B  45F
A0:   A0  40E  45E  408   A4  490   A6   A7  401   A9  404   AB   AC   AD   AE  407
B0:   B0   B1  406  456  491   B5   B6   B7  451 2116  454   BB  458  405  455  457
C0:  410  411  412  413  414  415  416  417  418  419  41A  41B  41C  41D  41E  41F
D0:  420  421  422  423  424  425  426  427  428  429  42A  42B  42C  42D  42E  42F
E0:  430  431  432  433  434  435  436  437  438  439  43A  43B  43C  43D  43E  43F
F0:  440  441  442  443  444  445  446  447  448  449  44A  44B  44C  44D  44E  44F
>> Any insight on why Windows do not use the ISO charset standard and
>> invent their own charset?
>Ha, you must be joking.
>Erik
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT