From: Peter Zilahy Ingerman, PhD (pzi@ingerman.org)
Date: Tue Mar 31 2009 - 16:36:05 CST
Well, FWIW they aren't the codes used for these characters in
WordPerfect 6.0 running under DOS.
Peter Ingerman
Asmus Freytag wrote:
> It does look like most of your examples represent two-byte escapes
> with each byte associated with a unique character.
> 5e = é
> e5 = s
> 66 = m
> e7 = p
> 74 = í (i with accent)
> b2 = g
>
> I have no suggestion that would explain the values, but they seem to
> be consistent, so it should be possible for find a proper context for
> each byte, and deal with combinations as derived from combinations of
> byte values (.i.e. as code sequences) rather than treating them as
> ligatures.
>
> A./
>
> On 3/31/2009 12:58 PM, John Burger wrote:
>
>> Hi -
>>
>> I have some parallel Chinese-English UN proceedings scraped from the
>> UN website some years ago, and further processed by the Linguistic
>> Data Consortium. I think the data were originally in one of the GB
>> variants, in MS Word or WordPerfect.
>>
>> The data is littered with some odd escape sequences, in both
>> languages, like this:
>>
>> ... Permanent Representatives and Charg\x{5ee5} daffaires of
>> Kuwait, Burundi ...
>> -\x{e76f}现?常任?事国 ...
>>
>> According to the LDC README, the "\x{}" is their way of escaping
>> WordPerfect encodings that they could not convert.
>>
>> I can guess what some of these are - e76f seems to occur after in
>> contexts that indicate it's some kind of spacing character, perhaps a
>> tab. Oddly, most of the rest seem to represent =two= characters.
>> For instance 5ee5 seems to be "és":
>>
>> misleading clich\x{5ee5} that
>> Mr. Andr\x{5ee5} Pastrana Arango
>>
>> Here's some others:
>>
>> highlighted by Mr. Rodr\x{74b2}uez
>> issued by the Espace r\x{5ee7}ublicain
>> transmitting an aide-m\x{5e66}oire issued
>>
>> These seem like odd choices for ligatures. I can correct some of
>> these, but there are hundreds of different ones. Sorry if I'm
>> providing insufficient information, but can anyone shed any light on
>> this?
>>
>> Thanks!
>>
>> - John D. Burger
>> MITRE
>>
>>
>>
>>
>>
>
>
>------------------------------------------------------------------------
>
>
>No virus found in this incoming message.
>Checked by AVG - www.avg.com
>Version: 8.5.285 / Virus Database: 270.11.35/2033 - Release Date: 03/31/09 13:05:00
>
>
>
This archive was generated by hypermail 2.1.5 : Tue Mar 31 2009 - 16:38:35 CST