Le 30/10/2013 16:13, "Jörg Knappen" a écrit :
> Thanks again!
> My updated sed pattern generator now looks like:
> r = range(0xa0, 0x170)
> file = open("fixu8.sed", "w")
> for i in r:
>   pat1 = 
> "s/"+unichr(i).encode("utf-8").decode("latin-1").encode("utf-8") + "/" 
> + unichr(i).encode("utf-8") +"/g"
>   print >>file, pat1
>   try:
>     pat2 = 
> "s/"+unichr(i).encode("utf-8").decode("windows-1252").encode("utf-8") 
> + "/" + unichr(i).encode("utf-8") +"/g"
>   except:
>     pat2 = pat1
>   if (pat1 != pat2):
>     print >>file, pat2
> doing both latin-1 and windows-1252 mangled double utf-8.  This is 
> probably enough for now, the rate of errors is low
> enough for practical purposes (i.e., lower than the natural error rate 
> introduced by typing errors)
>
Why to you do both latin1 and windows-1252 ? Windows-1252 is supposed to 
be a superset of latin1, so it should be enough. Or is there a problem 
with the few undefined bytes of windows-1252 (81, 8D, 8F, 90, 9D) ?
     Frédéric
Received on Wed Oct 30 2013 - 11:00:06 CDT
This archive was generated by hypermail 2.2.0 : Wed Oct 30 2013 - 11:00:06 CDT