Issue 42889: Incorrect behavior after ast node visits

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/87055

classification

Title:	Incorrect behavior after ast node visits
Type:	crash	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.10, Python 3.9, Python 3.8

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:		Nosy List:	gvanrossum, lys.nikolaou, serhiy.storchaka, terry.reedy, xxm
Priority:	normal	Keywords:

Created on 2021-01-11 07:45 by xxm, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (10)
msg384799 - (view)	Author: Xinmeng Xia (xxm)	Date: 2021-01-11 07:45
The following program will lead to a incorrect behavior of Python parser. We change all variable to integer( forcely transformed to string) via ast.NodeTransformer. Then we compile the new code and execute it. It's surprising that code like "1=2; print(1)" can be compiled without error message. And more surprisingly, the execution result of "print(1)" is 2 ! ! ==================================== import ast class RewriteName(ast.NodeTransformer): def visit_Name(self, node): if node.id != "print": node.id = str(node.lineno) return node code = "a = 2;print(a)" myast = ast.parse(code) transformer = RewriteName() newast = transformer.visit(myast) # print(ast.dump(newast)) print("new code:","------------") print(ast.unparse(newast)) print("------------") c = compile(newast,'','exec') exec(c) ================================= output result: new code: ------------ 1 = 2 print(1) ------------ 2 Expected result: (1). Syntax error during compilation. "1" should not be the correct identifier of program, even if it's forcely transformed to string. (2). The output of execution should be "1" , since the parameter of "print()" in the new code is string "1". This incorrect behaviors exists on all version of Python(from Python2 to Python 3).
msg384870 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2021-01-11 23:46
I don't think this is a bug, unless you can show an example where the bytecode compiler or the interpreter actually crashes. Basically you can change AST nodes in lots of ways that don't correspond to valid programs, and as long as you can't make CPython crash we don't care.
msg384879 - (view)	Author: Xinmeng Xia (xxm)	Date: 2021-01-12 03:18
Sorry, my description is a little confusing. My points lie on function 'compile' and 'exec'. Yes, I agree. AST can be modified and don't correspond to valid programs. But I don't think this invaild program can be compiled and exec without any check. It's dangerous. See the following program: For "compile" and "exec", no error is reported on Python 3.5-3.8 while error messages are reported on Python 3.9 and 3.10 ====================================== import ast class RewriteName(ast.NodeTransformer): def visit_Name(self, node): if node.id != "print": node.id = str(False) print(type(node.id)) return node code = "a = 2;print(a)" myast = ast.parse(code) transformer = RewriteName() newast = transformer.visit(myast) c = compile(newast,'','exec') exec(c) ===================================== Error message on Python 3.9 and 3.10. ------------------------------------- <class 'str'> <class 'str'> Traceback (most recent call last): File "/home/xxm/Desktop/nameChanging/report/test1.py", line 574, in <module> c = compile(newast,'','exec') ValueError: Name node can't be used with 'False' constant ------------------------------------- In fact, in class RewriteName, when "node.id" is assigned, the parser will check whether the identifier is a "str". If not,"TypeError: AST identifier must be of type str" will be reported. However, it's not enough. In Python, identifier names have their own naming rules. "str" could be "+","1","False", but these are not legally id. So the above error could be reported.
msg384881 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2021-01-12 03:41
Hm, if your argument is just that you can make invalid identifiers this way, I still don't see why that's a big deal. You can do `x.__dict__["1"] = 1` and now you've given x an attribute that's not a valid identifier. You could also call the code object constructor directly with some valid identifiers. You have to do better than saying "it's dangerous". Can you demonstrate a segfault?
msg384899 - (view)	Author: Xinmeng Xia (xxm)	Date: 2021-01-12 08:00
Nice suggestion! I change the argument and I can' find segfault program in transforming ast.Name. But I do find a segfault program in transforming ast.BinOp! Seeing the following example, this program will cause a segmentation fault on Python 3.10. No error will be reported during tranforming of node, but Python crashes during compiling the modified AST. ==================================== import ast class RewriteName(ast.NodeTransformer): def visit_BinOp(self, node): if node.left.value == 1: node.left = node return node code = """ mystr = 1 + (2+3) """ myast = ast.parse(code) transformer = RewriteName() newast = transformer.visit(myast) c = compile(newast,'<test>','exec') exec(c) =================================== I really think we should add a checker before compiling modified ast node or cancel the function of compiling AST object. An illegal AST of a program should not throw into "compile" function directly.
msg384957 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2021-01-12 15:54
Yeah, there's supposed to be a checker.
msg385129 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2021-01-16 00:38
The parser is not involved here. The transformed code is not equivalent to "1=2; print(1)" because you replace 'a' with the string '1', not the int 1. The result is to transform "globals()['a']=2; print(globals()['a'])" to "globals()['1']=2; print(globals()['1'])" This works because globals() is a regular dict. If one does try to make the code equivalent to "1=2; print(1)" by replacing with 1, the result is an error. Traceback (most recent call last): File "F:\Python\a\tem4.py", line 19, in <module> print(ast.unparse(newast)) File "C:\Programs\Python310\lib\ast.py", line 1567, in unparse return unparser.visit(ast_obj) File "C:\Programs\Python310\lib\ast.py", line 805, in visit return "".join(self._source) TypeError: sequence item 1: expected str instance, int found With unparse removed, I get a compile error when the identifier type is checked. Traceback (most recent call last): File "F:\Python\a\tem4.py", line 19, in <module> c = compile(newast,'','exec') TypeError: AST identifier must be of type str In the binary example, unparsing gives an infinite recursion error with this repeated sequence. File "C:\Programs\Python310\lib\ast.py", line 1372, in visit_BinOp self.traverse(node.left) File "C:\Programs\Python310\lib\ast.py", line 798, in traverse super().visit(node) File "C:\Programs\Python310\lib\ast.py", line 410, in visit return visitor(node) Without unparse, the compile call crashes. (In IDLE, this means there is an unrequested restart of the Shell subprocess that executes user code, without IDLE crashing.) I suspect that there is a related loop in the compile C code that crashes the process before any checking is done. If so, the situation would be similar to your #42887 and we may not be able to do anything.
msg385133 - (view)	Author: Xinmeng Xia (xxm)	Date: 2021-01-16 03:31
Thank you for your kindly explanations! The output of the first program in msg384799 behaves as expected from the view of AST compiling. Yes,I see now. But for the second example in msg384879, the behaviors are inconsistent between old Python version(3.6,3.7,3.8) and new Python version(3.9,3.10). It is probably something wrong in "compile" parsing bool string, "False"，"True" in new version of Python(3.9,3.10). I think a checker in function "compile" will not be complicated. Like you said, the simplest way I can think is to re-perform lexical analysis and syntax analysis. e.g. unparse ast, then parse ast before compiling AST object. As for #42887, only part of attributes will lead to that bug. I think it's attribute-related. If that bug is triggered by c loop, all attributes should be involved.
msg396969 - (view)	Author: Xinmeng Xia (xxm)	Date: 2021-07-05 04:40
This crash cannot be reproduced again on the master branch of CPython. It seems that this bug has been fixed. Should we close this issue and mark it as "fixed"?
msg396985 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2021-07-05 10:35
We usually used 'fixed' for 'fixed in this issue'. In any case, 'Resolution' is secdondary to 'Status'.

History
Date	User	Action	Args
2022-04-11 14:59:40	admin	set	github: 87055
2021-07-05 10:35:10	terry.reedy	set	status: open -> closed resolution: out of date messages: + msg396985 stage: resolved
2021-07-05 04:40:09	xxm	set	messages: + msg396969
2021-01-16 03:31:05	xxm	set	messages: + msg385133
2021-01-16 00:38:24	terry.reedy	set	nosy: + terry.reedy title: Incorrect behavior of Python parser after ast node of test program being modified -> Incorrect behavior after ast node visits messages: + msg385129 versions: - Python 3.6, Python 3.7
2021-01-12 15:54:01	gvanrossum	set	nosy: + serhiy.storchaka, lys.nikolaou messages: + msg384957
2021-01-12 08:00:33	xxm	set	type: behavior -> crash messages: + msg384899
2021-01-12 03:41:55	gvanrossum	set	messages: + msg384881
2021-01-12 03:18:07	xxm	set	messages: + msg384879
2021-01-11 23:46:30	gvanrossum	set	nosy: + gvanrossum messages: + msg384870
2021-01-11 07:45:39	xxm	create