Re: Getting A Newb Started

From: David Starner (prosfilaes@gmail.com)
Date: Mon Jul 07 2008 - 19:33:06 CDT

Next message: John H. Jenkins: "Re: Getting A Newb Started"

Previous message: William J Poser: "Re: Getting A Newb Started"
In reply to: William J Poser: "Re: Getting A Newb Started"
Next in thread: William J Poser: "Re: Getting A Newb Started"
Reply: William J Poser: "Re: Getting A Newb Started"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Mon, Jul 7, 2008 at 7:17 PM, William J Poser <wjposer@ldc.upenn.edu> wrote:
> Of course you want to be prepared for any possible input, but
> in some cases you do know what the range of possible inputs is.
> The input may not be coming directly from the user. It may be user
> input that has already been cleaned or validated, or it may be
> data that you yourself have generated.

Most low-level string processing code shouldn't need to be rewritten
for each application. If you've got UCS-2 only code, you have to
reëvaluate it for each project, or introduce a subtle bug by the reuse
of code. If you don't reuse code, you're probably rewriting code,
which introduces bugs, especially in the parts that aren't
well-tested--which for most people will include non-BMP characters.

And just because you can clean and validate user input doesn't mean
that you should arbitrarily forbid non-BMP characters. One of the
principles of Unicode is that you can pass through arbitrary scripts
and not worry about the difference.

> I don't get the point. Whether you're dealing with one character or
> many, life is simpler if they're all the same size.

If I have to look up a single character in an array, it makes a
difference. If I'm looking up multiple characters, it no longer
matters the length of any one of them; you're passing and returning
strings.

> But for some purposes, yes, you can assume that input is BMP-only.
> Not all input comes direct from the user.

Even for the times that you can assume integer input is positive, you
generally need to guard that code carefully with run-time tests. I
would regard nothing less as reasonable and necessary for code that
assumes the input in in the BMP. If simplicity is your goal, why not
use UTF-32?

Next message: John H. Jenkins: "Re: Getting A Newb Started"
Previous message: William J Poser: "Re: Getting A Newb Started"
In reply to: William J Poser: "Re: Getting A Newb Started"
Next in thread: William J Poser: "Re: Getting A Newb Started"
Reply: William J Poser: "Re: Getting A Newb Started"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jul 07 2008 - 19:36:19 CDT