Re: Is there a UTF that allows ISO 8859-1 (latin-1)?

From: Gianni Mariani (gianni@corp.webtv.net)
Date: Wed Aug 26 1998 - 14:14:14 EDT


John Cowan wrote:
>
>
> In addition, in some applications those processing inefficiencies are
> not present, thanks to the self-segregating nature of UTF-8. For
> example, the Plan 9 "fgrep" program (which searches a stream of text
> for the presence of one or more of a list of strings) need never convert
> to UCS format at all; the strings are UTF-8 and so is the text, and
> in fact the program looks the same as the corresponding 8-bit program.
>

This is not completely true, fgrep to be Unicode compliant must
deal correctly with combining characters. e.g.

è ( <latin small letter "e" with grave "`" U00E9> ) is exactly
equal to

<latin small letter e U0065> <modifier letter low grave accent ' U02CE>

So, grep should match <U00E9> with <U0065><U02CE> to be truly
Unicode compliant.

See section 2.5 of "The Unicode Standard 2.0" !

Not to say it isn't a good start with fgrep ...



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT