Re: converting ISO 8859-1 character set text to ASCII (128)charactet set

From: Jim Melton (jim.melton@acm.org)
Date: Wed Jun 20 2001 - 21:03:34 EDT


Well, the good news is that ASCII is a proper subset of Latin-1. By that,
I mean that every ASCII character is also a Latin-1 character, with the
exact same bit encoding in an 8-bit byte (an "octet"). Of course, ASCII is
a 7-bit encoding (coded character set), but it is very frequently
represented in 8-bit units with the high-order bit set to 0.

The bad news is that Latin-1 has approximately double the number of
characters as ASCII. If you encounter any of those characters, there is no
obvious single rule that you could use to "convert" such characters into
ASCII. They are simply different characters. There are a number of common
"fallback" representations for many of the characters that might (and I
emphasize: *might*) be acceptable in a single culture or language (e.g.,
American English), but I doubt that they would be universally acknowledged
as "right". (For example the Latin-1 character that I would call
"e-acute", which I can display on my PC as "é", is not available in
ASCII. Some environments might tolerate converting that to a plain old "e"
--- this is common in USA newspapers, for example. Other environments,
trying to capture the proper semantics, might covert it into the two
character sequence "e'". But those are, of course, no more correct than
representing the Japanese character for the English word "gate" by the
sequence of ASCII characters that crudely approximate the Japanese
pronunciation of that character.)

Hope this helps,
    Jim

At 05:03 AM 6/21/2001 +0530 Thursday, cls raj wrote:
>We have a specific requirment of converting Latin -1 character set ( iso
>8859-1 ) text to ASCII charactet set ( a set of only 128 characters). Is
>there any special set of utilities available or service providers who can
>do that type of job.
>
>It is kind of critical for my current project, I would appreciate if I
>have some quick HELP for this.
>
>Thanks
>
>Lourdu Sagayaraj ( RAJ)
>
>Here is my destination ascii character set :
>
>
>
>0 NUL 32 blank 64 @ 96
> `
>1 SOH 33 ! 65 A 97
> a
>2 STX 34 " 66 B 98
> b
>3 ETX 35 # 67 C 99
> c
>4 EOT 36 $ 68 D
>100 d
>5 ENQ 37 % 69 E
>101 e
>6 ACK 38 & 70 F
>102 f
>7 BEL 39 ' 71 G
>103 g
>8 BS 40 ( 72 H
>104 h
>9 HT 41 ) 73 I
>105 i
>10 LF 42 * 74 J
>106 j
>11 VT 43 + 75 K
>107 k
>12 FF 44 , 76 L
>108 l
>13 CR 45 - 77 M
>109 m
>14 SO 46 . 78 N
>110 n
>15 SI 47 / 79 O
>111 o
>16 DLE 48 0 80 P
>112 p
>17 DC1 49 1 81 Q
>113 q
>18 DC2 50 2 82 R
>114 r
>19 DC3 51 3 83 S
>115 s
>20 DC4 52 4 84 T
>116 t
>21 NAK 53 5 85 U
>117 u
>22 SYN 54 6 86 V
>118 v
>23 ETB 55 7 87 W
>119 w
>24 CAN 56 8 88 X
>120 x
>25 EM 57 9 89 Y
>121 y
>26 SUB 58 : 90 Z
>122 z
>27 ESC 59 ; 91 [
>123 {
>28 FS 60 < 92 \
>124 |
>29 GS 61 = 93 ]
>125 }
>30 RS 62 > 94 ^
>126 ~
>31 US 63 ? 95 _
>127 del
>
>
>
>_________________________________________________________________________
>Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
>
>

========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone: +1.801.942.0144
Oracle Corporation Oracle Email: mailto:jim.melton@oracle.com
1930 Viscounti Drive Standards email: mailto:jim.melton@acm.org
Sandy, UT 84093-1063 Personal email: mailto:jim.melton@acm.org
USA Fax : +1.801.942.3345
========================================================================
= Facts are facts. However, any opinions expressed are the opinions =
= only of myself and may or may not reflect the opinions of anybody =
= else with whom I may or may not have discussed the issues at hand. =
========================================================================



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT