Issue 46521: codeop._maybe_compile passes code with error + triple quotes

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/90679

classification

Title:	codeop._maybe_compile passes code with error + triple quotes
Type:	behavior	Stage:	resolved
Components:	Parser	Versions:	Python 3.11

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	BTaskaya, lys.nikolaou, pablogsal, terry.reedy, tusharsadhwani
Priority:	normal	Keywords:	patch

Created on 2022-01-25 15:09 by tusharsadhwani, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 31010	merged	pablogsal, 2022-01-29 17:50
PR 31213	merged	pablogsal, 2022-02-08 11:57

Messages (17)
msg411608 - (view)	Author: Tushar Sadhwani (tusharsadhwani) *	Date: 2022-01-25 15:09
compile_command used to raise error for this until Python 3.9: ``` >>> import code >>> code.compile_command("abc def '''") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.9/codeop.py", line 132, in compile_command return _maybe_compile(_compile, source, filename, symbol) File "/usr/lib/python3.9/codeop.py", line 106, in _maybe_compile raise err1 File "/usr/lib/python3.9/codeop.py", line 93, in _maybe_compile code1 = compiler(source + "\n", filename, symbol) File "/usr/lib/python3.9/codeop.py", line 111, in _compile return compile(source, filename, symbol, PyCF_DONT_IMPLY_DEDENT) File "<input>", line 1 abc def ''' ^ SyntaxError: invalid syntax ``` But in Python 3.10.0 it no longer is an error: ``` >>> import code >>> code.compile_command("abc def '''") >>> ```
msg411626 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2022-01-25 16:05
This is due to the fact that the new parser doesn't detect the syntax error of "abc def" after it has parsed the full text, but before that happens, the tokenizer has detected a problem (the ''' is not closed) and this is considered incomplete output for code (the same if you do code.compile_command('"""')). Unfortunately, this is not easy to fix once we have the tokenizer error in place so I am afraid we probably need to close this as "won't fix", unless someone has a good idea on how to accommodate for this case.
msg411628 - (view)	Author: Tushar Sadhwani (tusharsadhwani) *	Date: 2022-01-25 16:32
wontfix would really suck, because that would mean every REPL written with the `code` module will be broken, even IPython: ``` $ ipython Python 3.10.0 (default, Oct 11 2021, 05:33:59) [GCC 11.2.0] Type 'copyright', 'credits' or 'license' for more information IPython 8.0.1 -- An enhanced Interactive Python. Type '?' for help. In [1]: abc def ''' ...: ...: ```
msg411629 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2022-01-25 16:45
> wontfix would really suck, because that would mean every REPL written with the `code` module will be broken, even IPython: I understand, but I don't see a way to fix this without reverting the change to detect unclosed triple quites or without committing a lot of code to handle special cases. Also notice that in your example with python, as soon as you close the quotes you get an error: In [1]: asdf dsfsd """ ...: ...: ...: """ Input In [1] asdf dsfsd """ ^ SyntaxError: invalid syntax So is not as dramatic as you mention when you say "every REPL written with the `code` module will be broken". Although I understand it depends on the optics.
msg411633 - (view)	Author: Tushar Sadhwani (tusharsadhwani) *	Date: 2022-01-25 17:03
You're right. There was another bug in my code that was causing the SyntaxError to not show up at all, sorry about that. Can you help me figure out why this bug doesn't show up in the normal Python REPL? ``` >>> abc def ''' File "<stdin>", line 1 abc def ''' ^^^ SyntaxError: invalid syntax ``` I could then use whatever logic the REPL itself uses instead of relying on `code.compile_command()`, because my requirement is to detect if a code can be incomplete Python code, without ever compiling it.
msg411634 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2022-01-25 17:12
> Can you help me figure out why this bug doesn't show up in the normal Python REPL? That's because the normal Python REPL works very differently when in interactive mode. This is because the tokenizer in interactive mode is coupled with reading from standard input. We would need to somehow decouple this behaviour with interactive mode, allow a new option to the parser that allows activating this mode and then hook this to codeop module (and maybe other places).
msg411635 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2022-01-25 17:14
> because my requirement is to detect if a code can be incomplete Python code, without ever compiling it. AS I mentioned in other issues, unfortunately the new parser doesn't allow to do this as the old one does, because how it works. The codeop module hacks around this by comparing the error messages if you add new lines, which is know to be fragile and quite bug-friendly. So I am afraid there is not going to be a reliable and supported way to do this with the new parser. You may need to use a 3rd party library that allows parsing incomplete code.
msg412053 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2022-01-29 05:52
Tushar, I am the principle IDLE maintainer and have dealt with similar interactive compile problems on and off for a few years. Code module uses codeop._maybe_compile, which in turn uses batch-mode compile(). To review: Code can be in 3 states: legal, illegal (there is an positive error that must be fixed), and incomplete (might become legal with more input). Batch mode compile has all the code it is going to get, so it raises an exception for both bad and incomplete code. In command-line languages, '\n' definitely signals end-of-command. Even in interactive mode, incomplete commands get an error message. The user must retype or recall and add more. Python is a multiline and compound statement language. Newline may instead signal the end of a compound statement header or the end of a nested statement, or even just be present for visual formatting. Being able to continue incomplete statements after newline is essential. In interactive mode, the interpreter looks at code after each newline and differentiates between unrecoverable and merely incomplete errors. In the latter case, it sends a prompt to enter more and then reads more. codeop._maybe compile attempts to simulate interactive mode and make a trinary decision (returning None for 'incomplete') using batch-mode binary compile(). A hack using repeated compiles, a warning filter, and a helper function classifies code that failed the initial compile. I suspect that a) it has never been perfect and b) it cannot be (see experiment below). The issue here is that _maybe_compile returns None instead of passing on the compile syntax error. Some debug prints would reveal exactly why. An alternate approach might be to compile just once and use the error message and marked error range to split. But that would require different message-range pairs for the different cases. Compile does not seem to give this to us. For the current case, 3.11.a04 compile (and ast.parse) give the same response to both bad and incomplete lines. >>> compile("a b '''", '', 'single') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "", line 1 a b ''' ^ SyntaxError: unterminated triple-quoted string literal (detected at line 1) >>> compile("s='''", '', 'single') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "", line 1 s=''' ^ SyntaxError: unterminated triple-quoted string literal (detected at line 1 But the REPL somehow treats the two lines differently when directly entered. >>> s = ''' ... ... ''' >>> a b ''' File "<stdin>", line 1 a b ''' ^ SyntaxError: unterminated triple-quoted string literal (detected at line 1) Pablo, is there any possibility that the internal REPL parser could be wrapped, exposed to Python, and called with fake stdin/out/err objects? Or if one executes 'python -i -c "start code"', what is required of the standard streams for '-i' to be effective and actually shift into interactive mode, reading from and writing to those streams? Just claim to be a tty? It is any different than for reading responses to "input('prompt')"?
msg412087 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2022-01-29 17:41
>> Pablo, is there any possibility that the internal REPL parser could be wrapped, exposed to Python, and called with fake stdin/out/err objects? I would really advise against this. Unfortunately, the state of affairs is that the REPL is somehow super entangled with the parser, to the point that is the tokenizer asking for more characters that triggers a read from stdin. Exposing this with some API would be super dangerous because we would be elevated to the "supported" level anything that works just because it happens to be close to what the interactive mode needs. On the other side, codeop is fundamentally flawed in the way is built because relies on comparing error messages from the parser, which is a very poor way of handling semantic information. What we need here is a mode of the parser that somehow raises a very specific error on incomplete input, but not an incorrect one. And that may be not immediate to implement given in how many places we can raise lexer and parser errors. I will give it a go, in any case
msg412090 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2022-01-29 18:12
Ugh, the approach to do that breaks super heavily test_idle :(
msg412091 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2022-01-29 18:16
If we want to go with this approach, I am going to need help to fix test_idle as I have no idea why is failing if test_codeop passes.
msg412113 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2022-01-30 00:10
Thank you for the PR. As I wrote on my preliminary review, I see this likely 1 failure that might may be in the test fixture. Will test and debug later.
msg412123 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2022-01-30 03:19
With my fix to the PR: >>> a b ''' SyntaxError: unterminated triple-quoted string literal (detected at line 1) >>> a ''' ... The message is off, and can be left for another issue (or not), but the behavior is correct. With the hack removed, all the tests in test_codeop that the hack works can be removed and replaced by a simple test that a code object, error, or None are returned in 3 different cases.
msg412139 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2022-01-30 11:37
> The message is off That's because the tokenizer sees the error before the parser even has time to see the other one. Not sure if is technically anything to fix here other than the order of reporting two different errors, which may be a bit tricky to fix.
msg412831 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2022-02-08 11:54
New changeset 69e10976b2e7682c6d57f4272932ebc19f8e8859 by Pablo Galindo Salgado in branch 'main': bpo-46521: Fix codeop to use a new partial-input mode of the parser (GH-31010) https://github.com/python/cpython/commit/69e10976b2e7682c6d57f4272932ebc19f8e8859
msg412832 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2022-02-08 12:25
New changeset 5b58db75291cfbb9b6785c9845824b3e2da01c1c by Pablo Galindo Salgado in branch '3.10': [3.10] bpo-46521: Fix codeop to use a new partial-input mode of the parser (GH-31010). (GH-31213) https://github.com/python/cpython/commit/5b58db75291cfbb9b6785c9845824b3e2da01c1c
msg412833 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2022-02-08 12:26
I am not backporting to 3.9 because the parser is different enough that introducing this would also introduce some unintended side effects.

History
Date	User	Action	Args
2022-04-11 14:59:55	admin	set	github: 90679
2022-02-08 12:26:02	pablogsal	set	status: open -> closed resolution: fixed messages: + msg412833 stage: patch review -> resolved
2022-02-08 12:25:24	pablogsal	set	messages: + msg412832
2022-02-08 11:57:26	pablogsal	set	pull_requests: + pull_request29383
2022-02-08 11:54:42	pablogsal	set	messages: + msg412831
2022-01-30 11:37:06	pablogsal	set	messages: + msg412139
2022-01-30 03:19:16	terry.reedy	set	messages: + msg412123
2022-01-30 00:10:41	terry.reedy	set	messages: + msg412113
2022-01-29 18:16:16	pablogsal	set	messages: + msg412091
2022-01-29 18:12:47	pablogsal	set	messages: + msg412090
2022-01-29 17:50:14	pablogsal	set	keywords: + patch stage: patch review pull_requests: + pull_request29189
2022-01-29 17:41:05	pablogsal	set	messages: + msg412087
2022-01-29 05:52:02	terry.reedy	set	nosy: + terry.reedy title: compile_command not raising syntax error when command ends with triple quotes -> codeop._maybe_compile passes code with error + triple quotes messages: + msg412053 versions: + Python 3.11, - Python 3.10
2022-01-25 17:14:27	pablogsal	set	messages: + msg411635
2022-01-25 17:12:32	pablogsal	set	messages: + msg411634
2022-01-25 17:03:51	tusharsadhwani	set	messages: + msg411633
2022-01-25 16:45:46	pablogsal	set	messages: + msg411629
2022-01-25 16:32:43	tusharsadhwani	set	messages: + msg411628
2022-01-25 16:05:04	pablogsal	set	nosy: + BTaskaya, pablogsal messages: + msg411626
2022-01-25 15:16:50	pablogsal	set	nosy: - pablogsal
2022-01-25 15:09:36	tusharsadhwani	create