Rationale wanted for Unicode identifier rules

From: John Cowan (jcowan@reutershealth.com)
Date: Wed Mar 01 2000 - 13:53:17 EST

Next message: Kenneth Whistler: "Re: lists of actual character/diacritic combinations"
Previous message: James E. Agenbroad: "Re: lists of actual character/diacritic combinations (fwd)"
Next in thread: Kenneth Whistler: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Kenneth Whistler: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: addison@globalsight.com: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Kenneth Whistler: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: John Cowan: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Gary Roberts: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Kenneth Whistler: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: John Cowan: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Valeriy E. Ushakov: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Dan Oscarsson: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Timothy Partridge: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Kenneth Whistler: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Mark Davis: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Tex Texin: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Paul Dempsey: "RE: Rationale wanted for Unicode identifier rules"
Maybe reply: Tex Texin: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: John Cowan: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: John Cowan: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: John Cowan: "Re: Rationale wanted for Unicode identifier rules"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

(Still waiting for my bookstore to get 3.0 book.)

Section 5.14 of 2.0 says:

# The formal syntax provided here is intended to capture the general
# intent that an identifier consists of a string of characters that starts
# with a letter or an ideograph, and then follows with any number of letters,
# ideographs, digits, or underscores.

Can anyone give me a rationale for rejecting the following argument:

> There are some [syntax] characters we know we need to prohibit [in
> identifiers, such as +, -, etc.], as well as a couple of ranges of
> control characters, but other than that I'm not sure why it's worth
> bothering.
>
> [...] I don't see the need for prohibiting every possible
> punctuation character or characters such as a smiley or a snow man,
> even though I would probably not use them in an [identifier] myself. As
> long as they don't conflict with the [rest of the] syntax, it makes no
> difference [to the] parser.

In other words, programming languages have historically tended to allow
anything in an identifier that wasn't used for some syntactic purpose;
leading digits were forbidden to make lexers simpler. What specific
reason is there not to treat all hitherto-unknown Unicode characters
as legitimate in identifiers, in the manner of the Plan9 C compiler
(which extends C to treat everything from U+00A0 on up as valid)?

I need this to help me write a draft standard, so I'm not asking out
of randomness.

-- 
Schlingt dreifach einen Kreis vom dies! || John Cowan <jcowan@reutershealth.com>
Schliesst euer Aug vor heiliger Schau,  || http://www.reutershealth.com
Denn er genoss vom Honig-Tau,           || http://www.ccil.org/~cowan
Und trank die Milch vom Paradies.            -- Coleridge (tr. Politzer)

Next message: Kenneth Whistler: "Re: lists of actual character/diacritic combinations"
Previous message: James E. Agenbroad: "Re: lists of actual character/diacritic combinations (fwd)"
Next in thread: Kenneth Whistler: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Kenneth Whistler: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: addison@globalsight.com: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Kenneth Whistler: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: John Cowan: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Gary Roberts: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Kenneth Whistler: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: John Cowan: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Valeriy E. Ushakov: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Dan Oscarsson: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Timothy Partridge: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Kenneth Whistler: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Mark Davis: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Tex Texin: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: Paul Dempsey: "RE: Rationale wanted for Unicode identifier rules"
Maybe reply: Tex Texin: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: John Cowan: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: John Cowan: "Re: Rationale wanted for Unicode identifier rules"
Maybe reply: John Cowan: "Re: Rationale wanted for Unicode identifier rules"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT