From: Dominikus Scherkl (lyratelle@gmx.de)
Date: Wed Jun 08 2005 - 04:50:32 CDT
This Message was intended to go to the whole list (my fault):
> > > consider the case of the non-breaking space (U+00A0) which may 
> > > follow  lots of uppercase ISO 8859-1 Letters (U+00C0..U+00DF).
> > 
> > Remember that Lasse's idea is to check _all_ the text; so
> > while NBSP certainly can occur after an capital accentuated 
> > letter (or an eszet)
> 
> But Uppercase accented letters fortunately do not often 
> occure at the end of words, do they? Only ß (eszet, U+00DF) 
> is likey to occure before NBSP often, because it's a common 
> word-ending in german, but DF A0 to DF BF in UTF-8 means 
> U+07E0 to U+7FF, thus far unassigned codepoints (in the near 
> future a N'Ko letters), that are realy unlikey to occure in 
> the middle of german words.
> 
> More of a Thread is 'Â' (C2) followed by some punctuation 
> like NBSP (A0), '«' (AB) '»' (BB), '¿' (BF) or '¡' (A1), 
> which stand for themthelves in UTF-8. So Words ending in 'Â' 
> may be missinterpreted by simply swallowing the letter. This 
> may be realy hard to detect. But as stated above, uppercase 
> accented letters are very uncommon word endings, and text 
> containing accented letters are very, verys unlikely to 
> contain them _only_ in such uncommon positions.
> 
> Best Regards.
> 
> -- 
> Dominikus Scherkl
> 
This archive was generated by hypermail 2.1.5 : Wed Jun 08 2005 - 04:52:16 CDT