[thelist] What's the reasoning behind not starting IDs with anumber?

jason.handby jason.handby at corestar.co.uk
Thu Jan 11 12:51:45 CST 2007


> I figure they're trying not to clash with programming 
> languages which might automatically create variables based on 
> ID or NAME. Most programming languages do not allow variables 
> to begin with a number.
> Now, why do programming languages not allow this? Don't know.


The first step in compiling or interpreting a programming language is
tokenization -- turning the series of individual characters from the
file into a stream of tokens which can then be parsed. The tokenizer has
to decide whether a string of characters represents a string literal, a
numeric literal, an operator, a "word" (e.g. an identifier, a function
name)... and it has to do this without referring to the details of the
grammar or semantics of the language itself (as that comes later). Oh,
and because the tokenizer is of a pretty simple order of algorithmic
complexity it has to be able to do this from looking at the first
character.

So the easy way to do it is to have numeric literals start with a digit
-- that way you can tell from the very first character that it's a
number you are dealing with. Operators start with *, -, + etc, string
literals start with ' or ", and identifiers start with a letter or an
underscore (which doesn't get used to begin anything else and so won't
confuse the tokenizer).

If you allow identifiers to start with digits, you don't know for sure
at the tokenizing stage whether something should be treated as a number
or an identifier -- and the tokenization rules are different for the
two. For example, _ is a valid character in an identifier but not in a
number, whereas . is valid in a number but not in an identifier.


I think that's why!




Jason



More information about the thelist mailing list