classification
Title: Incorrect behavior after ast node visits
Type: crash Stage:
Components: Interpreter Core Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: gvanrossum, lys.nikolaou, serhiy.storchaka, terry.reedy, xxm
Priority: normal Keywords:

Created on 2021-01-11 07:45 by xxm, last changed 2021-01-16 03:31 by xxm.

Messages (8)
msg384799 - (view) Author: Xinmeng Xia (xxm) Date: 2021-01-11 07:45
The following program will lead to a incorrect behavior of Python parser.  We change all variable to integer( forcely transformed to string) via ast.NodeTransformer. Then we compile the new code and execute it. It's surprising that code like "1=2; print(1)" can be compiled without error message. And more surprisingly, the execution result of "print(1)" is 2 ! !

====================================
import ast
class RewriteName(ast.NodeTransformer):
    def visit_Name(self, node):
        if node.id != "print":
            node.id = str(node.lineno)
        return node


code = "a = 2;print(a)"


myast = ast.parse(code)
transformer = RewriteName()
newast = transformer.visit(myast)

# print(ast.dump(newast))

print("new code:","------------")
print(ast.unparse(newast))
print("------------")

c = compile(newast,'','exec')
exec(c)
=================================

output result:
new code: ------------
1 = 2
print(1)
------------
2

Expected result: (1). Syntax error during compilation. "1" should not be the correct identifier of program, even if it's forcely transformed to string. (2). The output of execution should be "1"  , since the parameter of "print()" in the new code is string "1".  

This incorrect behaviors exists on all version of Python(from Python2 to Python 3).
msg384870 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-01-11 23:46
I don't think this is a bug, unless you can show an example where the bytecode compiler or the interpreter actually crashes. Basically you can change AST nodes in lots of ways that don't correspond to valid programs, and as long as you can't make CPython crash we don't care.
msg384879 - (view) Author: Xinmeng Xia (xxm) Date: 2021-01-12 03:18
Sorry, my description is a little confusing. My points lie on function 'compile' and 'exec'. Yes, I agree. AST can be modified and don't correspond to valid programs.  But I don't think this invaild program can be compiled and exec without any check. It's dangerous.  

See the following program: For "compile" and "exec", no error is reported on Python 3.5-3.8 while error messages are reported on Python 3.9 and 3.10

======================================
import ast
class RewriteName(ast.NodeTransformer):
    def visit_Name(self, node):
        if node.id != "print":
            node.id = str(False)
            print(type(node.id))
        return node

code = "a = 2;print(a)"

myast = ast.parse(code)
transformer = RewriteName()
newast = transformer.visit(myast)

c = compile(newast,'','exec')
exec(c)
=====================================
Error message on Python 3.9  and 3.10.
-------------------------------------
<class 'str'>
<class 'str'>
Traceback (most recent call last):
  File "/home/xxm/Desktop/nameChanging/report/test1.py", line 574, in <module>
    c = compile(newast,'','exec')
ValueError: Name node can't be used with 'False' constant
-------------------------------------

In fact, in class RewriteName, when "node.id" is assigned, the parser will check whether the identifier is a "str". If not,"TypeError: AST identifier must be of type str" will be reported. However, it's not enough. In Python, identifier names have their own naming rules.  "str" could be "+","1","False", but these are not legally id. So the above error could be reported.
msg384881 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-01-12 03:41
Hm, if your argument is just that you can make invalid identifiers this way, I still don't see why that's a big deal. You can do `x.__dict__["1"] = 1` and now you've given x an attribute that's not a valid identifier. You could also call the code object constructor directly with some valid identifiers. You have to do better than saying "it's dangerous". Can you demonstrate a segfault?
msg384899 - (view) Author: Xinmeng Xia (xxm) Date: 2021-01-12 08:00
Nice suggestion! I change the argument and I can' find segfault program in transforming ast.Name.  But I do find a segfault program in transforming ast.BinOp!

Seeing the following example, this program will cause a segmentation fault on Python 3.10. No error will be reported during tranforming of node, but Python crashes during compiling the modified AST.

====================================
import ast
class RewriteName(ast.NodeTransformer):
    def visit_BinOp(self, node):
        if node.left.value == 1:
            node.left = node
        return node

code = """
mystr  = 1 + (2+3)
"""

myast = ast.parse(code)

transformer = RewriteName()
newast = transformer.visit(myast)

c = compile(newast,'<test>','exec')
exec(c)
===================================

I really think we should add a checker before compiling modified ast node or cancel the function of compiling AST object. An illegal AST of a program should not throw into "compile" function directly.
msg384957 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-01-12 15:54
Yeah, there's supposed to be a checker.
msg385129 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-01-16 00:38
The parser is not involved here.  The transformed code is *not* equivalent to "1=2; print(1)" because you replace 'a' with the string '1', not the int 1.  The result is to transform
 "globals()['a']=2; print(globals()['a'])" to 
 "globals()['1']=2; print(globals()['1'])"
This works because globals() is a regular dict.

If one does try to make the code equivalent to "1=2; print(1)" by replacing with 1, the result is an error.

Traceback (most recent call last):
  File "F:\Python\a\tem4.py", line 19, in <module>
    print(ast.unparse(newast))
  File "C:\Programs\Python310\lib\ast.py", line 1567, in unparse
    return unparser.visit(ast_obj)
  File "C:\Programs\Python310\lib\ast.py", line 805, in visit
    return "".join(self._source)
TypeError: sequence item 1: expected str instance, int found

With unparse removed, I get a compile error when the identifier type is checked.

Traceback (most recent call last):
  File "F:\Python\a\tem4.py", line 19, in <module>
    c = compile(newast,'','exec')
TypeError: AST identifier must be of type str

In the binary example, unparsing gives an infinite recursion error with this repeated sequence.

  File "C:\Programs\Python310\lib\ast.py", line 1372, in visit_BinOp
    self.traverse(node.left)
  File "C:\Programs\Python310\lib\ast.py", line 798, in traverse
    super().visit(node)
  File "C:\Programs\Python310\lib\ast.py", line 410, in visit
    return visitor(node)

Without unparse, the compile call crashes.  (In IDLE, this means there is an unrequested restart of the Shell subprocess that executes user code, without IDLE crashing.)  I suspect that there is a related loop in the compile C code that crashes the process before any checking is done.  If so, the situation would be similar to your #42887 and we may not be able to do anything.
msg385133 - (view) Author: Xinmeng Xia (xxm) Date: 2021-01-16 03:31
Thank you for your kindly explanations! The output of the first program in msg384799 behaves as expected from the view of AST compiling.  Yes,I see now. But for the second example in msg384879, the behaviors are inconsistent between old Python version(3.6,3.7,3.8) and new Python version(3.9,3.10). It is probably something wrong in "compile" parsing bool string, "False","True" in new version of Python(3.9,3.10). 

I think a checker in function "compile" will not be complicated. Like you said, the simplest way I can think is to re-perform lexical analysis and syntax analysis. e.g. unparse ast, then parse ast before compiling AST object. 

As for #42887, only part of attributes will lead to that bug. I think it's attribute-related. If that bug is triggered by c loop, all attributes should be involved.
History
Date User Action Args
2021-01-16 03:31:05xxmsetmessages: + msg385133
2021-01-16 00:38:24terry.reedysetnosy: + terry.reedy
title: Incorrect behavior of Python parser after ast node of test program being modified -> Incorrect behavior after ast node visits
messages: + msg385129

versions: - Python 3.6, Python 3.7
2021-01-12 15:54:01gvanrossumsetnosy: + serhiy.storchaka, lys.nikolaou
messages: + msg384957
2021-01-12 08:00:33xxmsettype: behavior -> crash
messages: + msg384899
2021-01-12 03:41:55gvanrossumsetmessages: + msg384881
2021-01-12 03:18:07xxmsetmessages: + msg384879
2021-01-11 23:46:30gvanrossumsetnosy: + gvanrossum
messages: + msg384870
2021-01-11 07:45:39xxmcreate