-
-
Notifications
You must be signed in to change notification settings - Fork 29.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tokenize: mishandles line joining #46433
Comments
tokenize does not handle line joining properly, as the following string Example 1:
>>> s = "if 1:\n \\\n #hey\n print 1"
>>> exec s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 3
#hey
^
SyntaxError: invalid syntax
>>> tokenize.tokenize(StringIO(s).readline)
1,0-1,2: NAME 'if'
1,3-1,4: NUMBER '1'
1,4-1,5: OP ':'
1,5-1,6: NEWLINE '\n'
2,0-2,2: INDENT ' '
3,2-3,6: COMMENT '#hey'
3,6-3,7: NEWLINE '\n'
4,2-4,7: NAME 'print'
4,8-4,9: NUMBER '1'
5,0-5,0: DEDENT ''
5,0-5,0: ENDMARKER '' |
CPython allows \ at EOF, but tokenize does not. >>> s = 'print 1\\\n'
>>> exec s
1
>>> tokenize.tokenize(StringIO(s).readline)
1,0-1,5: NAME 'print'
1,6-1,7: NUMBER '1'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/tokenize.py",
line 153, in tokenize
tokenize_loop(readline, tokeneater)
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/tokenize.py",
line 159, in tokenize_loop
for token_info in generate_tokens(readline):
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/tokenize.py",
line 283, in generate_tokens
raise TokenError, ("EOF in multi-line statement", (lnum, 0))
tokenize.TokenError: ('EOF in multi-line statement', (2, 0)) |
Nobody appears to be interested so I'll close this in a couple of weeks unless someone objects, unless a patch is provided. |
Mark, please stop closing these based on age. |
That syntax error is coming from the CPython parser and *not* the tokenizer. Both CPython and the 'tokenizer' modules produce the same tokenization: [meadori@motherbrain cpython]$ cat repro.py pass
SyntaxError: invalid syntax The NEWLINE INDENT NEWLINE tokenization causes the parser to choke because 'suite' nonterminals: suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT are defined as NEWLINE INDENT. It seems appropriate that the NEWLINE after INDENT should be dropped by both tokenizers. In other words, I think: pass should produce the same tokenization as: """ pass This seems consistent with with how explicit line joining is defined [2]. [1] http://hg.python.org/cpython/file/92842e347d98/Grammar/Grammar |
Here's an example in the wild which still reproduces with python3.8a3: This was reported as a bug on flake8: https://gitlab.com/pycqa/flake8/issues/532 Here's the reproduction with python3.8: $ python3.8 --version --version
Python 3.8.0a3 (default, Mar 27 2019, 03:46:44)
[GCC 7.3.0]
$ python3.8 impacket/examples/remcomsvc.py
$ python3.8 -mtokenize impacket/examples/remcomsvc.py
impacket/examples/remcomsvc.py:1670:0: error: EOF in multi-line statement |
Thanks for figuring this one out Anthony! :) |
SyntaxError
#13401Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: