classification
Title: On Python parsing numbers.
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Jean-Michel.Fauth, benjamin.peterson, ezio.melotti, georg.brandl, gvanrossum, jcea, mark.dickinson, neologix, rhettinger, rutsky, terry.reedy
Priority: low Keywords:

Created on 2011-12-16 07:27 by Jean-Michel.Fauth, last changed 2016-05-02 20:29 by rutsky. This issue is now closed.

Messages (11)
msg149596 - (view) Author: Jean-Michel Fauth (Jean-Michel.Fauth) Date: 2011-12-16 07:27
Can this be fixed? As far as I can remember (ver. 1.5.6),
it has always existed. Python does not crash, I find it
inelegant. Should it not be a SyntaxError?

Side effect. Searching for keywords, (eg. with re, "\b") may
practically always implies to handle this case separately.

Python all versions.

>>> 1and 0
0
>>> 1or 0
1
>>> 9if True else 22
9
>>> 0.1234if True else 22
0.1234
>>> [999for i in range(3)]
[999, 999, 999]
msg149608 - (view) Author: Charles-Fran├žois Natali (neologix) * (Python committer) Date: 2011-12-16 10:21
> Can this be fixed?

More or less.
The following patch does the trick, but is not really elegant:
"""
--- a/Parser/tokenizer.c        2011-06-01 02:39:38.000000000 +0000
+++ b/Parser/tokenizer.c        2011-12-16 08:48:45.000000000 +0000
@@ -1574,6 +1576,10 @@
             }
         }
         tok_backup(tok, c);
+        if (is_potential_identifier_start(c)) {
+            tok->done = E_TOKEN;
+            return ERRORTOKEN;
+        }
         *p_start = tok->start;
         *p_end = tok->cur;
         return NUMBER;
"""

"""
> python -c "1and 0"
  File "<string>", line 1
    1and 0
    ^
SyntaxError: invalid token
"""

Note that there are other - although less bothering - limitations:
"""
> python -c "1 and@ 2"
  File "<string>", line 1
    1 and@ 2
       ^
SyntaxError: invalid syntax
"""

This should be catched by the lexer, not the parser (i.e. it should raise an "Invalid token" error).
That's a limitation of the ad-hoc scanner.
msg149625 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-12-16 14:38
> Can this be fixed?

Not without breaking backwards compatibility, I would think.
msg149626 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2011-12-16 14:53
I think it's fairly harmless. Perhaps Python 4.
msg149651 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-12-17 01:19
The proposal is to change the definition of numbers literals from X to one that is context-sensitive: X followed by whitespace or a syntactic symbol but not anything else, in particular, not by an identifier_start character. I am +-0 at the moment.

> 1 and@ 2
I presume this is parsed as 1 and @ 2, which is a syntax error.
msg149672 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-12-17 15:49
I don't see a good reason to change this.
msg149677 - (view) Author: Jean-Michel Fauth (Jean-Michel.Fauth) Date: 2011-12-17 16:53
I have done a little bit hd/files archeology and
found some of my comments.

Pointing on number litterals is probably wrong. The fact
is that, this happens with practically any expression. And 
strangely, not all keywords (constructs?) are affected.

>>> 999if 1 else 888
999
>>> """"""if 1 else 888

>>> {1: 'a'}if 1 else 888
{1: 'a'}
>>> 999 if 'a' else 888
999
>>> 999if 'a' else 888
999
>>> 999if 'a'else 888
999
>>> 999if 888else 888
  File "<eta last command>", line 1
    999if 888else 888
             ^
SyntaxError: invalid token
>>> 999if """"""else 888
888

To summarize: The Python syntax does not require an "isolated"
keyword, something like \b<keyword>\b.
msg149680 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-12-17 17:21
>>> 999if 888else 888
  File "<eta last command>", line 1
    999if 888else 888
             ^
SyntaxError: invalid token

This might be because 888e5 is a valid expression, so the 'e' is parsed as part of the number rather than a separate token.

>>> 999 if 888.else 888
  File "<stdin>", line 1
    999 if 888.else 888
               ^
SyntaxError: invalid token
>>> 999 if 888jelse 888
999
msg149700 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-12-17 19:53
-1 I'm with Mark, Georg, and Benjamin on this one.
msg149701 - (view) Author: Jean-Michel Fauth (Jean-Michel.Fauth) Date: 2011-12-17 20:07
> Ezio Melotti
Good catch.

I'm not complaining. I just find funny to see the number of editors
not "colorizing" this kind of Python valid expressions. (IDLE included)

For me, subject close.
msg149704 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-12-17 21:15
I'll close this then.
History
Date User Action Args
2016-05-02 20:29:37rutskysetnosy: + rutsky
2016-05-02 17:12:25berker.peksaglinkissue26908 superseder
2011-12-17 21:15:56ezio.melottisetstatus: open -> closed
resolution: wont fix
messages: + msg149704

stage: resolved
2011-12-17 20:07:44Jean-Michel.Fauthsetmessages: + msg149701
2011-12-17 19:53:43rhettingersetnosy: + rhettinger
messages: + msg149700
2011-12-17 17:21:26ezio.melottisetnosy: + ezio.melotti
messages: + msg149680
2011-12-17 16:55:15pitrousetnosy: + gvanrossum

versions: + Python 3.3, - Python 3.4
2011-12-17 16:53:51Jean-Michel.Fauthsetmessages: + msg149677
2011-12-17 15:49:19georg.brandlsetnosy: + georg.brandl
messages: + msg149672
2011-12-17 01:19:46terry.reedysetversions: + Python 3.4, - Python 2.7
nosy: + terry.reedy

messages: + msg149651

type: enhancement
2011-12-16 17:45:07jceasetnosy: + jcea
2011-12-16 14:53:44benjamin.petersonsetpriority: normal -> low

messages: + msg149626
2011-12-16 14:38:43mark.dickinsonsetnosy: + mark.dickinson
messages: + msg149625
2011-12-16 10:21:38neologixsetnosy: + neologix
messages: + msg149608
2011-12-16 07:31:52pitrousetnosy: + benjamin.peterson
2011-12-16 07:27:20Jean-Michel.Fauthcreate