New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grammar Incongruence #77947
Comments
echo 'print("a");print("b")' > test.py This program is grammatically incorrect according to the specification (https://docs.python.org/3.8/reference/grammar.html). But Python 3 runs it without issue. It's this production here simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE which says 'simple_stmt's must be terminated by a newline. However, the program I wrote doesn't contain any newlines. I think the grammar spec is missing some information, but I'm not quite sure what. Does anyone have an idea? |
NEWLINE is not a newline. It is the NEWLINE token. And it is generated at the end of file. $ echo 'print("a");print("b")' | ./python -m tokenize
1,0-1,5: NAME 'print'
1,5-1,6: OP '('
1,6-1,9: STRING '"a"'
1,9-1,10: OP ')'
1,10-1,11: OP ';'
1,11-1,16: NAME 'print'
1,16-1,17: OP '('
1,17-1,20: STRING '"b"'
1,20-1,21: OP ')'
1,21-1,22: NEWLINE '\n'
2,0-2,0: ENDMARKER '' |
Thanks for the clarification. Is there a reference to this in the documentation? |
I went through that document before I created this issue. I can't find anything which describes this behavior - could you be more specific please? |
Actually, echo implicitly puts a newline at the end. If you run with echo -n, this is the output: $ echo -n 'print("a");print("b")' | python3 -m tokenize
1,0-1,5: NAME 'print'
1,5-1,6: OP '('
1,6-1,9: STRING '"a"'
1,9-1,10: OP ')'
1,10-1,11: OP ';'
1,11-1,16: NAME 'print'
1,16-1,17: OP '('
1,17-1,20: STRING '"b"'
1,20-1,21: OP ')'
2,0-2,0: ENDMARKER '' No newline token present. |
Good point Ammar. Seems there is also a missing corner case in the definition of a physical line: https://docs.python.org/3.8/reference/lexical_analysis.html#physical-lines It misses a case when a physical line is terminated by the end of file. |
Relevant bit of the parser that emits a fake newline at the end of the file if not present: https://github.com/python/cpython/blob/master/Parser/tokenizer.c#L1059-L1069 |
Cool, thanks for the help. Should I submit a PR with the updated documentation? |
Sorry, I was already working on the patch by the time you posted the comment. If we see above, it seems like the tokenize module doesn't correctly mirror the behavior of the C tokenizer. Do you want to try fixing that as a bug? That would involve making a new bpo ticket and submitting a PR there. |
I am fine with adding this to the docs. But the irony of the case is that the echo command adds a newline, so the original premise (that test.py contains an invalid program) is incorrect. ;-) |
A few years ago, there was a particular case in which compile failed without a trailing newline. We fixed it so that it would work anyway.
Unless we are willing for a conforming Python interpreter to fail
>>> exec('print("hello")')
hello The Reference Manual should be clear that EOF and EOS (end-of-string) is treated as NEWLINE. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: