classification
Title: Grammar Incongruence
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.8, Python 3.7, Python 3.6, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Isaac Elliott, ammar2, docs@python, gvanrossum, miss-islington, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2018-06-04 04:05 by Isaac Elliott, last changed 2018-06-10 00:28 by terry.reedy. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 7383 merged ammar2, 2018-06-04 05:30
PR 7569 merged miss-islington, 2018-06-09 23:50
PR 7571 merged miss-islington, 2018-06-09 23:52
PR 7570 merged miss-islington, 2018-06-09 23:55
Messages (14)
msg318617 - (view) Author: Isaac Elliott (Isaac Elliott) Date: 2018-06-04 04:05
echo 'print("a");print("b")' > test.py

This program is grammatically incorrect according to the specification (https://docs.python.org/3.8/reference/grammar.html). But Python 3 runs it without issue.


It's this production here

simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE

which says 'simple_stmt's must be terminated by a newline. However, the program I wrote doesn't contain any newlines.

I think the grammar spec is missing some information, but I'm not quite sure what. Does anyone have an idea?
msg318620 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-06-04 04:31
NEWLINE is not a newline. It is the NEWLINE token. And it is generated at the end of file.

$ echo 'print("a");print("b")' | ./python -m tokenize
1,0-1,5:            NAME           'print'        
1,5-1,6:            OP             '('            
1,6-1,9:            STRING         '"a"'          
1,9-1,10:           OP             ')'            
1,10-1,11:          OP             ';'            
1,11-1,16:          NAME           'print'        
1,16-1,17:          OP             '('            
1,17-1,20:          STRING         '"b"'          
1,20-1,21:          OP             ')'            
1,21-1,22:          NEWLINE        '\n'           
2,0-2,0:            ENDMARKER      ''
msg318622 - (view) Author: Isaac Elliott (Isaac Elliott) Date: 2018-06-04 04:37
Thanks for the clarification. Is there a reference to this in the documentation?
msg318624 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-06-04 04:42
https://docs.python.org/3.8/reference/lexical_analysis.html
msg318625 - (view) Author: Isaac Elliott (Isaac Elliott) Date: 2018-06-04 04:47
I went through that document before I created this issue. I can't find anything which describes this behavior - could you be more specific please?
msg318627 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-06-04 04:51
Actually, echo implicitly puts a newline at the end. If you run with echo -n, this is the output:

$ echo -n 'print("a");print("b")' | python3 -m tokenize
1,0-1,5:            NAME           'print'
1,5-1,6:            OP             '('
1,6-1,9:            STRING         '"a"'
1,9-1,10:           OP             ')'
1,10-1,11:          OP             ';'
1,11-1,16:          NAME           'print'
1,16-1,17:          OP             '('
1,17-1,20:          STRING         '"b"'
1,20-1,21:          OP             ')'
2,0-2,0:            ENDMARKER      ''

No newline token present.
msg318629 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-06-04 04:55
Good point Ammar.

Seems there is also a missing corner case in the definition of a physical line:

https://docs.python.org/3.8/reference/lexical_analysis.html#physical-lines
"""
A physical line is a sequence of characters terminated by an end-of-line sequence. In source files, any of the standard platform line termination sequences can be used - the Unix form using ASCII LF (linefeed), the Windows form using the ASCII sequence CR LF (return followed by linefeed), or the old Macintosh form using the ASCII CR (return) character. All of these forms can be used equally, regardless of platform.
"""

It misses a case when a physical line is terminated by the end of file.
msg318631 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-06-04 05:01
Relevant bit of the parser that emits a fake newline at the end of the file if not present: https://github.com/python/cpython/blob/master/Parser/tokenizer.c#L1059-L1069
msg318633 - (view) Author: Isaac Elliott (Isaac Elliott) Date: 2018-06-04 05:29
Cool, thanks for the help. Should I submit a PR with the updated documentation?
msg318634 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-06-04 05:37
Sorry, I was already working on the patch by the time you posted the comment. If we see above, it seems like the tokenize module doesn't correctly mirror the behavior of the C tokenizer. Do you want to try fixing that as a bug? That would involve making a new bpo ticket and submitting a PR there.
msg318668 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2018-06-04 16:18
I am fine with adding this to the docs. But the irony of the case is that the echo command adds a newline, so the original premise (that test.py contains an invalid program) is incorrect. ;-)
msg319091 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-06-08 18:39
A few years ago, there was a particular case in which compile failed without a trailing newline.  We fixed it so that it would work anyway.
Unless we are willing for a conforming Python interpreter to fail
>>> exec('print("hello")')
hello

The Reference Manual should be clear that EOF and EOS (end-of-string) is treated as NEWLINE.
msg319186 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-06-09 23:49
New changeset 0aa17ee6a76df0946d42e7657a501f1862065a22 by Terry Jan Reedy (Ammar Askar) in branch 'master':
bpo-33766: Document that end of file or string is a newline (GH-7383)
https://github.com/python/cpython/commit/0aa17ee6a76df0946d42e7657a501f1862065a22
msg319187 - (view) Author: miss-islington (miss-islington) Date: 2018-06-09 23:55
New changeset f01b951a0e70f36ca2a3caa043f89a5277bb0bb0 by Miss Islington (bot) in branch '2.7':
bpo-33766: Document that end of file or string is a newline (GH-7383)
https://github.com/python/cpython/commit/f01b951a0e70f36ca2a3caa043f89a5277bb0bb0
History
Date User Action Args
2018-06-10 00:28:18terry.reedysetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-06-09 23:55:29miss-islingtonsetnosy: + miss-islington
messages: + msg319187
2018-06-09 23:55:28miss-islingtonsetpull_requests: + pull_request7201
2018-06-09 23:52:50miss-islingtonsetpull_requests: + pull_request7200
2018-06-09 23:50:58miss-islingtonsetpull_requests: + pull_request7199
2018-06-09 23:49:42terry.reedysetmessages: + msg319186
2018-06-08 18:39:33terry.reedysetnosy: + terry.reedy
messages: + msg319091
2018-06-04 16:18:03gvanrossumsetmessages: + msg318668
2018-06-04 05:37:06ammar2setmessages: + msg318634
2018-06-04 05:30:37ammar2setkeywords: + patch
stage: patch review
pull_requests: + pull_request7009
2018-06-04 05:29:33Isaac Elliottsetmessages: + msg318633
2018-06-04 05:01:35ammar2setmessages: + msg318631
2018-06-04 04:55:42serhiy.storchakasetversions: + Python 2.7, - Python 3.5
nosy: + gvanrossum, docs@python

messages: + msg318629

assignee: docs@python
components: + Documentation, - Interpreter Core
2018-06-04 04:51:27ammar2setmessages: + msg318627
2018-06-04 04:47:10Isaac Elliottsetmessages: + msg318625
2018-06-04 04:42:08ammar2setnosy: + ammar2
messages: + msg318624
2018-06-04 04:37:17Isaac Elliottsetmessages: + msg318622
2018-06-04 04:31:27serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg318620
2018-06-04 04:05:20Isaac Elliottcreate