classification
Title: Unclosed bracket bug in code.interact prevents identifying syntax errors
Type: Stage:
Components: Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: aroberge, gvanrossum, lys.nikolaou, pablogsal, terry.reedy
Priority: normal Keywords:

Created on 2021-03-02 11:10 by aroberge, last changed 2021-03-06 03:54 by pablogsal.

Messages (12)
msg387911 - (view) Author: Andre Roberge (aroberge) * Date: 2021-03-02 11:10
When using code.interact() with version 3.10.a6, an unclosed bracket [, or paren (, or {, will prevent identifying syntax errors.

First, the correct behaviour using Python

Python 3.10.0a6 ....
>>> [
... def test():
  File "<stdin>", line 2
    def test():
    ^
SyntaxError: invalid syntax


Demonstration of bug

>>> import code
>>> code.interact()
Python 3.10.0a6 ...
(InteractiveConsole)
>>> [
... def test():
...
...
KeyboardInterrupt
>>>
msg387912 - (view) Author: Andre Roberge (aroberge) * Date: 2021-03-02 11:12
I suspect that this is caused by a change associated with https://bugs.python.org/issue43163
msg387915 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-03-02 11:26
Hummmm, I see what we can do but I am not sure this is a bug or a regression. There is nothing that guarantees this behaviour. For instance, this is how the pypy reprl works:

𓋹 pypy
Python 2.7.18 (a29ef73f9b32953753d0dd6d2a56255fa2892e24, Dec 30 2020, 16:15:15)
[PyPy 7.3.3 with GCC Apple LLVM 12.0.0 (clang-1200.0.32.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>> [
.... def test():
....
....
....
....

KeyboardInterrupt
msg387916 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-03-02 11:28
The fact that CPython's parser eagerly fails is certainly something that we pursue but is not a guarantee or a contract.

Nevertheless, I will try to investigate if this can be fixed.
msg387942 - (view) Author: Andre Roberge (aroberge) * Date: 2021-03-02 17:11
I understand the challenge of reproducing the behaviour of the Python interpreter for this case. If it cannot be reproduced, then the documentation for https://docs.python.org/3/library/code.html#code.InteractiveConsole.interact and others on that page should probably be modified as it states that it "Closely emulate the interactive Python console."
msg388174 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-03-06 00:43
#43163 was about the opposite problem of raising SyntaxError too soon, when a valid continuation to imcomplete code is possible.  As with that issue, IDLE has the same problem, which is not in code.interact() itself but in codeop._maybe_compile.. 

Calling compile with, gives message, resulting in user seeing
'[def'                unclosed [      
'[def\n'              bad syntax     Syntax Error
'[\ndef'              unclosed [
'[\ndef\n'            unclosed [     prompt for more input

In the last line, the added \n after [ changes the compile SyntaxError message and, after PR 24483 for #43163, results in fruitlessly waiting for non-existent correct input until the user enters garbage or hits ^C. This is at best a codeop-specific regression from better behavior in 3.9

Changing 'def' to 'def f():' only changes the first message to 'invalid syntax' but not the user-visible result.

As near as I can tell, compile('[\n1,', ...) and compile('[\ndef', ...) give the same unclosed message pointing to the opening [.  How does the regular REPL know to prompt for input for the first and raise SyntaxError for the second?  Some unobvious flag combination?  (It is not 'exec' versus 'single'.)

>>> [
... 1,
... def
  File "<stdin>", line 3
    def
    ^
SyntaxError: invalid syntax

Guido, do you have any idea how python decides this or where the code is or who might know better?
msg388175 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-03-06 00:48
This is a change I implemented and is explicitly deactivated in the reprl:

https://github.com/python/cpython/blob/master/Parser/pegen.c#L1191

The problem here is that when you call compile() there is no way to know if you are doing it to simulate a reprl() or not.
msg388176 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-03-06 00:50
Maybe we need to add an API (e.g. a flag to compile()) so that we can ask the actual parser the question we're interested in, rather than having to use hacks and heuristics?
msg388178 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-03-06 00:53
> Maybe we need to add an API (e.g. a flag to compile()) so that we can ask the actual parser the question we're interested in, rather than having to use hacks and heuristics?

I was thinking about that, but we need to fix some inconsistencies along the way first. I am currently already doing that work (starting with https://github.com/python/cpython/pull/24763) and once that is fixed we can pass that down with some flags to the compiler.
msg388181 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-03-06 01:24
In any case, the underlying problem here is the fact that codeop is deciding to continue asking for input of not depending on the repr() of the syntax error exception itself, which is the major hack here.

Previously, it worked because for unclosed parens the parser was throwing different line numbers for the end of file as codeop keeps adding new lines, but now all these points to the same unclosed paren.
msg388188 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-03-06 02:08
Add compile(..., mode='repl')?
"If mode is 'repl', compile returns None to indicate that the code is incomplete as is but might become valid if more lines (or maybe just more code) were added"

Deprecate _maybe_compile (and stop trying to patch it).
msg388194 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-03-06 03:50
> "If mode is 'repl', compile returns None to indicate that the code is incomplete as is but might become valid if more lines (or maybe just more code) were added"

That would be ideal, but my guess is that is not trivial because even if you can intercept the tokenizer when it fetches new lines from the source, you need to correctly propagate that information through several tokenized layers and the parser trying to backtrack.

If someone manages to do that, that would be the cleanest solution.
History
Date User Action Args
2021-03-06 03:54:54pablogsalsetnosy: + lys.nikolaou
2021-03-06 03:50:20pablogsalsetmessages: + msg388194
2021-03-06 02:08:20terry.reedysetmessages: + msg388188
2021-03-06 01:24:08pablogsalsetmessages: + msg388181
2021-03-06 00:53:38pablogsalsetmessages: + msg388178
2021-03-06 00:50:58gvanrossumsetmessages: + msg388176
2021-03-06 00:48:07pablogsalsetmessages: + msg388175
2021-03-06 00:43:53terry.reedysetnosy: + gvanrossum, terry.reedy
messages: + msg388174
2021-03-02 17:11:49arobergesetmessages: + msg387942
2021-03-02 11:28:11pablogsalsetmessages: + msg387916
2021-03-02 11:26:54pablogsalsetnosy: + pablogsal
messages: + msg387915
2021-03-02 11:12:18arobergesetmessages: + msg387912
2021-03-02 11:10:19arobergecreate