Re: Unicode and end users

From: David Starner (starner@okstate.edu)
Date: Sat Feb 16 2002 - 14:09:29 EST


On Fri, Feb 15, 2002 at 02:57:46PM +0000, David Hopwood wrote:
> Not having to add a few more lines of code to grep and sed is a good
> trade-off for a 50% penalty in encoding efficiency for Indic & Southeast
> Asian scripts, Katakana, Hiragana and a few others? I don't think so.

Not complicating every program that does searching on the system? Having
every program just work, instead of having bug reports show up every so
often for the next few years? I happen to agree with you, but I think
the consequences are more important than you imply.
 
> In another post, you wrote:
> > [...]
>
> If "foo" is a US-ASCII string, "grep foo file" will work fine with any
> US-ASCII-superset charset for which non-ASCII characters do not use
> bytes < 0x80, including the hypothetical one I described, with no
> possibility of a false match.

But this is a different context. It won't work, if grep tries to stick a
BOM on the output.

-- 
David Starner / Давид Старнэр - starner@okstate.edu
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, "Peace and Love, Inc."



This archive was generated by hypermail 2.1.2 : Sat Feb 16 2002 - 13:56:57 EST