New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in function builtins #87394
Comments
The bpo-42990 introduced a regression with the following commit: commit d6c33fb
The commit adds a new PyFunctionObject.func_builtins member which is not constructed properly in the following code. The reproduce the issue, use attached cloudpickle_bug.py reproducer: ./python -m venv env Current output: 3
Traceback (most recent call last):
File "/home/vstinner/python/master/cloudpickle_bug.py", line 10, in <module>
print(func2(text))
File "/home/vstinner/python/master/cloudpickle_bug.py", line -1, in func
NameError: name 'len' is not defined Output before this commit: pickle.loads() calls PyFunction_NewWithQualName() with globals={'__package__': None, '__name__': '__main__', '__file__': '/home/vstinner/python/master/cloudpickle_bug.py'}. In that case, _PyEval_BuiltinsFromGlobals() creates the {'None': None} builtin dictionary and stores it in PyFunctionObject.func_builtins. => func2 builtins is {'None': None} Before the commit, calling func2() got the builtins dictionary from frame_get_builtins(PyFrameObject *back, PyObject *globals). Now it's get from the PyFunctionObject.func_builtins member which is not initialized properly. If you want to see where PyFunction_NewWithQualName() is called, you can put a break point in _PyEval_BuiltinsFromGlobals(), at the code path creating the {'None': None} dictionary. vstinner@apu$ gdb -args ./env/bin/python -i cloudpickle_bug.py (gdb) run Breakpoint 1, _PyEval_BuiltinsFromGlobals (globals={'__package__': None, '__name__': '__main__', '__file__': '/home/vstinner/python/master/cloudpickle_bug.py'}) (gdb) where (gdb) py-bt
Traceback (most recent call first):
<built-in method loads of module object at remote 0x7fffea378d10>
File "/home/vstinner/python/master/cloudpickle_bug.py", line 9, in <module>
func2 = pickle.loads(cloudpickle.dumps(func)) |
cloudpickle issue: cloudpipe/cloudpickle#410 |
I created a new "3.10regression" for this issue ;-) |
Pablo asked me to put the priority to release blocker. |
Do you have a reproducer that does not use cloudpickle? Pickling functions seems to work correctly. >>> import pickle
>>> def func():
... return len([])
...
>>> func2 = pickle.loads(pickle.dumps(func))
>>>
>>> func2()
0 How is cloudpickle supposed to work? It looks like the globals dict passed to FunctionType(...) lacks a __builtins__. |
Right. But it worked in Python 3.9 :-) |
It creates a CodeType object and then create a function with this code object. It serializes the bytecode. pickle is not affected since it doesn't create function objects, but retrieve them from imported modules. |
func_builtins.py: reproducer which has no import. func2 builtins is {'None': None} on Python 3.10. Oh. This script also fails on Python 3.9. I'm not sure why cloudpickle_bug.py works on Python 3.9. |
You need to define __builtins__ in the globals dictionary. def func(s):
return len(s)
text = "abc"
print(func(text))
FuncType = type(func)
func_globals = {"__builtins__":__builtins__.__dict__}
code = func.__code__
func2 = FuncType(code, func_globals)
print(func2(text)) works for both 3.9 and 3.10. Cloudpickle needs to initialize the globals dict *before* creating the function. |
I'm not an expert nor an author but this might help: Cloudpickle offers extended possibilities for pickling but uses the standard pickle module for unpickling: >>> import pickle, cloudpickle
>>> cloudpickle.load is pickle.load
True
>>> cloudpickle.loads is pickle.loads
True So, the question here is why the new Python version cannot handle cloudpickle output anymore. >>> def f():
... return len("aaaa")
...
>>> pickle.dumps(f)
b'\x80\x04\x95\x12\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x01f\x94\x93\x94.'
>>> cloudpickle.dumps(f)
b'\x80\x05\x95\xa9\x01\x00\x00\x00\x00\x00\x00\x8c\x17cloudpickle.cloudpickle\x94\x8c\r_builtin_type\x94\x93\x94\x8c\nLambdaType\x94\x85\x94R\x94(h\x02\x8c\x08CodeType\x94\x85\x94R\x94(K\x00K\x00K\x00K\x00K\x02KCC\x08t\x00d\x01\x83\x01S\x00\x94N\x8c\x04aaaa\x94\x86\x94\x8c\x03len\x94\x85\x94)\x8c\x07<stdin>\x94\x8c\x01f\x94K\x01C\x02\x00\x01\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94N\x8c\x08__name__\x94\x8c\x08__main__\x94uNNNt\x94R\x94\x8c\x1ccloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94h\x18}\x94}\x94(h\x15h\x0f\x8c\x0c__qualname__\x94h\x0f\x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x16\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94u\x86\x94\x86R0.' It seems to me that cloudpickle adds also __globals__ to the final output and pickle is no longer able to restore it. |
In short, cloudpickle recreates a function in two steps:
def _function_setstate(obj, state):
...
state, slotstate = state
...
obj_globals = slotstate.pop("__globals__")
...
obj.__globals__.update(obj_globals)
obj.__globals__["__builtins__"] = __builtins__
... The internal PyFunctionObject.func_builtins can only by set when a function is created, it cannot be updated later. Why not exposing it in funcobject.c (func_memberlist) as we do for closure, doc, globals and module? So cloudpickle can hack it for its usage. Documentation of these function attributes: In Python 3.9, _PyFrame_New_NoTrack() starts by looking at builtins in the parent frame ("back"): use builtins of the parent frame if the parent frame uses exactly the same dictionary of globals. Then it looks for "__builtins__" keys in the globals dictionary. In the current Python 3.10, the builtins are retrieved from the internal PyFunctionObject.func_builtins member. Here is the raw pickle bytecode created by cloudpickle: $ ./python -m pickletools bug.pickle
0: \x80 PROTO 5
2: \x95 FRAME 479
11: \x8c SHORT_BINUNICODE 'cloudpickle.cloudpickle'
36: \x94 MEMOIZE (as 0)
37: \x8c SHORT_BINUNICODE '_builtin_type'
52: \x94 MEMOIZE (as 1)
53: \x93 STACK_GLOBAL
54: \x94 MEMOIZE (as 2)
55: \x8c SHORT_BINUNICODE 'LambdaType'
67: \x94 MEMOIZE (as 3)
68: \x85 TUPLE1
69: \x94 MEMOIZE (as 4)
70: R REDUCE
71: \x94 MEMOIZE (as 5)
72: ( MARK
73: h BINGET 2
75: \x8c SHORT_BINUNICODE 'CodeType'
85: \x94 MEMOIZE (as 6)
86: \x85 TUPLE1
87: \x94 MEMOIZE (as 7)
88: R REDUCE
89: \x94 MEMOIZE (as 8)
90: ( MARK
91: K BININT1 1
93: K BININT1 0
95: K BININT1 0
97: K BININT1 1
99: K BININT1 2
101: K BININT1 67
103: C SHORT_BINBYTES b't\x00|\x00\x83\x01S\x00'
113: \x94 MEMOIZE (as 9)
114: N NONE
115: \x85 TUPLE1
116: \x94 MEMOIZE (as 10)
117: \x8c SHORT_BINUNICODE 'len'
122: \x94 MEMOIZE (as 11)
123: \x85 TUPLE1
124: \x94 MEMOIZE (as 12)
125: \x8c SHORT_BINUNICODE 's'
128: \x94 MEMOIZE (as 13)
129: \x85 TUPLE1
130: \x94 MEMOIZE (as 14)
131: \x8c SHORT_BINUNICODE '/home/vstinner/python/master/cloudpickle_bug.py'
180: \x94 MEMOIZE (as 15)
181: \x8c SHORT_BINUNICODE 'func'
187: \x94 MEMOIZE (as 16)
188: K BININT1 4
190: C SHORT_BINBYTES b'\x00\x01'
194: \x94 MEMOIZE (as 17)
195: ) EMPTY_TUPLE
196: ) EMPTY_TUPLE
197: t TUPLE (MARK at 90)
198: \x94 MEMOIZE (as 18)
199: R REDUCE
200: \x94 MEMOIZE (as 19)
201: } EMPTY_DICT
202: \x94 MEMOIZE (as 20)
203: ( MARK
204: \x8c SHORT_BINUNICODE '__package__'
217: \x94 MEMOIZE (as 21)
218: N NONE
219: \x8c SHORT_BINUNICODE '__name__'
229: \x94 MEMOIZE (as 22)
230: \x8c SHORT_BINUNICODE '__main__'
240: \x94 MEMOIZE (as 23)
241: \x8c SHORT_BINUNICODE '__file__'
251: \x94 MEMOIZE (as 24)
252: h BINGET 15
254: u SETITEMS (MARK at 203)
255: N NONE
256: N NONE
257: N NONE
258: t TUPLE (MARK at 72)
259: \x94 MEMOIZE (as 25)
260: R REDUCE
261: \x94 MEMOIZE (as 26)
262: \x8c SHORT_BINUNICODE 'cloudpickle.cloudpickle_fast'
292: \x94 MEMOIZE (as 27)
293: \x8c SHORT_BINUNICODE '_function_setstate'
313: \x94 MEMOIZE (as 28)
314: \x93 STACK_GLOBAL
315: \x94 MEMOIZE (as 29)
316: h BINGET 26
318: } EMPTY_DICT
319: \x94 MEMOIZE (as 30)
320: } EMPTY_DICT
321: \x94 MEMOIZE (as 31)
322: ( MARK
323: h BINGET 22
325: h BINGET 16
327: \x8c SHORT_BINUNICODE '__qualname__'
341: \x94 MEMOIZE (as 32)
342: h BINGET 16
344: \x8c SHORT_BINUNICODE '__annotations__'
361: \x94 MEMOIZE (as 33)
362: } EMPTY_DICT
363: \x94 MEMOIZE (as 34)
364: \x8c SHORT_BINUNICODE '__kwdefaults__'
380: \x94 MEMOIZE (as 35)
381: N NONE
382: \x8c SHORT_BINUNICODE '__defaults__'
396: \x94 MEMOIZE (as 36)
397: N NONE
398: \x8c SHORT_BINUNICODE '__module__'
410: \x94 MEMOIZE (as 37)
411: h BINGET 23
413: \x8c SHORT_BINUNICODE '__doc__'
422: \x94 MEMOIZE (as 38)
423: N NONE
424: \x8c SHORT_BINUNICODE '__closure__'
437: \x94 MEMOIZE (as 39)
438: N NONE
439: \x8c SHORT_BINUNICODE '_cloudpickle_submodules'
464: \x94 MEMOIZE (as 40)
465: ] EMPTY_LIST
466: \x94 MEMOIZE (as 41)
467: \x8c SHORT_BINUNICODE '__globals__'
480: \x94 MEMOIZE (as 42)
481: } EMPTY_DICT
482: \x94 MEMOIZE (as 43)
483: u SETITEMS (MARK at 322)
484: \x86 TUPLE2
485: \x94 MEMOIZE (as 44)
486: \x86 TUPLE2
487: R REDUCE
488: 0 POP
489: . STOP
highest protocol among opcodes = 4 |
Attached func_builtins2.py mimicks the cloudpicke bug: def func(s): return len(s)
code = func.__code__
FuncType = type(func)
func2_globals = {}
func2 = FuncType(code, func2_globals)
# func2.func_builtins = {'None': None}
func2.__globals__['__builtins__'] = __builtins__
# frame created with {'None': None} builtins: "len" key is missing
func2("abc") In Python 3.10, setting func2.__globals__['__builtins__'] has no longer an impact on the frame created to call func2(). |
I wrote _testinernalcapi.get_builtins() function to help me debuging this issue: diff --git a/Modules/_testinternalcapi.c b/Modules/_testinternalcapi.c
index ab6c5965d1..250ecc61ab 100644
--- a/Modules/_testinternalcapi.c
+++ b/Modules/_testinternalcapi.c
@@ -279,6 +279,14 @@ test_atomic_funcs(PyObject *self, PyObject *Py_UNUSED(args))
}
+static PyObject*
+get_builtins(PyObject *self, PyObject *obj)
+{
+ PyFunctionObject *func = (PyFunctionObject *)obj;
+ return Py_NewRef(func->func_builtins);
+}
+
+
static PyMethodDef TestMethods[] = {
{"get_configs", get_configs, METH_NOARGS},
{"get_recursion_depth", get_recursion_depth, METH_NOARGS},
@@ -289,6 +297,7 @@ static PyMethodDef TestMethods[] = {
{"get_config", test_get_config, METH_NOARGS},
{"set_config", test_set_config, METH_O},
{"test_atomic_funcs", test_atomic_funcs, METH_NOARGS},
+ {"get_builtins", get_builtins, METH_O},
{NULL, NULL} /* sentinel */
}; |
I wrote PR 24559 to expose the functions builtins in Python as a new __builtins__ attributes on functions, but also to document the subtle behavior change. |
I don't think PR 24559 will be sufficient to fix this. Pickled functions need to share the builtins dict, not just have a copy of it. Adding a "builtins" parameter to types.FunctionType() should be be enough. So: function(code, globals, name=None, argdefs=None, closure=None)
would become:
function(code, globals, name=None, argdefs=None, closure=None, builtins=None) The change to cloudpickle is a bit complex as the pickle code would need to change the arguments to types.FunctionType() from (code, globals, name, argdefs) So cloudpickle will need to emit pickle bytecode to create the tuple, using the unpickler's __builtins__.__dict__. |
I don't see why cloudpickle calls LambdaType() directly. It could use a helper in their cloudpickle.cloudpickle_fast which calls LambdaType() with the proper arguments, especially to get __builtins__. I don't think that cloudpickle should serialize __builtins__! By the way, I'm not sure why the function constructor builds {'None': None} namespace for builtins rather than inheriting the current builtins namespace. IMO it's unconvenient. If someone wants to override __builtins__, setting __builtins__ key in the globals namespace should be enough.
Oh right, we should give the ability to control this important parameter! |
Issue fixed in bpo-42990. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: