msg229723 - (view) |
Author: kbengine (kbengine) |
Date: 2014-10-20 11:38 |
I have an application, the use of Python3.2.3 development.
When I upgrade to Python3.4.2, found a problem.
I have an extended "xxx.c (c-python)" module, I call pickle to serialize and deserialize, _pickle.c calls the whichmodule to look for this module, The final will be to find the module from sys.modules.
But probably there are 200 elements in sys.modules, Use the "obj = getattribute (module, global_name, allow_qualname)" to try to get the object:
static PyObject *
getattribute(PyObject *obj, PyObject *name, int allow_qualname) {
...
...
...
tmp = PyObject_GetAttr(obj, subpath);
Py_DECREF(obj);
// There will be hundreds of times to return to NULL
// There will be hundreds of times calls "PyErr_Format
// (PyExc_AttributeError," Can't get attribute%R on%R ", name, obj);"
// This process will lead to hundreds of calls to "moduleobject.c-
// module_repr(PyModuleObject *m)".
// So I frequently call pickle.dumps CPU consumption sharply.
if (tmp == NULL) {
if (PyErr_ExceptionMatches(PyExc_AttributeError)) {
PyErr_Clear();
PyErr_Format(PyExc_AttributeError,
"Can't get attribute %R on %R", name, obj);
}
Py_DECREF(dotted_path);
return NULL;
}
...
...
}
------------------------------------------------------
ncalls tottime percall cumtime percall filename:lineno(function)
315 0.001 0.000 0.004 0.000 <frozen importlib._bootstrap>:690(_module_repr)
I went wrong?
Look forward to answer, thanks!
|
msg229727 - (view) |
Author: Brett Cannon (brett.cannon) *  |
Date: 2014-10-20 15:02 |
In Python 3.3 the import machinery changed to use importlib. This means the code to create the representation of a module now calls into Python code (the `<frozen importlib._bootstrap>:690(_module_repr)` you're seeing).
But my question is why are you not calling PyObject_HasAttr() before calling PyObject_GetAttr()? Exceptions may be relatively cheap but they are not free.
|
msg229746 - (view) |
Author: kbengine (kbengine) |
Date: 2014-10-21 01:22 |
This is a misunderstanding, "getattribute (PyObject *obj, PyObject *name, int allow_qualname)" is the Python code, in the Python/Modules/_pickle.c (1538).
Do efficiency is decreased a lot.
|
msg229748 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2014-10-21 07:14 |
HasAttr would just call GetAttr and discard the exception.
@ OP: what objects are you pickling? You can give them a __module__ attribute to save the lookup in sys.modules.
|
msg229749 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2014-10-21 07:20 |
There's no doubt that whichmodule() has grown more complex since 3.2. We probably cannot eliminate the O(<number of modules>) component, but could at least make it faster.
|
msg229751 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2014-10-21 07:29 |
Note the problem is easily reproduce. For example we can take numpy where some ufuncs (i.e. function-like objects implemented in C) don't have a __module__.
$ PYTHONHASHSEED=0 python3.4 -m timeit -s "import pickle, numpy; d=numpy.add" "pickle.dumps(d)"
1000 loops, best of 3: 280 usec per loop
$ PYTHONHASHSEED=0 python3.4 -m timeit -s "import pickle, numpy; d=numpy.diff" "pickle.dumps(d)"
100000 loops, best of 3: 2.74 usec per loop
We see that pickling numpy.add (which doesn't have a __module__) is 100x slower than numpy.diff (which has a __module__)! Note I'm forcing PYTHONHASHSEED for consistent results, since whichmodule() uses dict iteration.
For comparison, 2.7 is fast enough:
$ PYTHONHASHSEED=0 python2.7 -m timeit -s "import cPickle as pickle, numpy; d=numpy.add" "pickle.dumps(d)"
100000 loops, best of 3: 6.12 usec per loop
$ PYTHONHASHSEED=0 python2.7 -m timeit -s "import cPickle as pickle, numpy; d=numpy.diff" "pickle.dumps(d)"
100000 loops, best of 3: 2.35 usec per loop
(varying PYTHONHASHSEED didn't produce any slower results)
|
msg229752 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2014-10-21 07:44 |
Attached patch addresses the repeated __repr__ calling and gives a 4x speedup here.
|
msg229753 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2014-10-21 08:05 |
Actually, numpy.add takes a different path. Sorry.
|
msg229754 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2014-10-21 08:08 |
So, a proper way to reproduce it is through Ellipsis (which does go through whichmodule()):
$ PYTHONHASHSEED=0 python3.4 -m timeit -s "import pickle; d=Ellipsis" "pickle.dumps(d)"
1000 loops, best of 3: 201 usec per loop
|
msg229755 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2014-10-21 08:10 |
> For example we can take numpy where some ufuncs (i.e. function-like objects implemented in C) don't have a __module__.
Oh. I was not aware of that. Is there a way to fix numpy to set the __module__ attribute? Maybe we should warn users that serialiaing objects without __module__ is much slower?
|
msg229756 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2014-10-21 08:13 |
whichmodule_speedup.diff: I proposed once a more generic solution which is much more complex to implement. Lazy formatting of Exception message: in most cases, the message is not used. Replace AttributeError(message) with AttributeError(attr=name), only format when str(exc) is called.
|
msg229757 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2014-10-21 08:17 |
Victor, see https://github.com/numpy/numpy/issues/4952
|
msg229758 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2014-10-21 08:17 |
Attached patch produces a 8x speedup on Ellipsis.
|
msg229759 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2014-10-21 08:22 |
I didn't have numpy anyway, I tested on a function with __module__ set to None manually. But I see the same speedup for Ellipsis.
|
msg229900 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2014-10-23 20:52 |
New changeset e5ad1f27fb54 by Antoine Pitrou in branch 'default':
Issue #22676: Make the pickling of global objects which don't have a __module__ attribute less slow.
https://hg.python.org/cpython/rev/e5ad1f27fb54
|
msg229904 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2014-10-23 21:00 |
Now applied. As Georg said, though, the definitive fix is to add a __module__ attribute to your global objects.
|
msg230545 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2014-11-03 16:43 |
The fix for this now uses module without initializing it, which could lead to a crash if get_dotted_path() tries to raise an exception (it puts module into the error string with %R). See Modules/_pickle.c#l1656 and Modules/_pickle.c#l1538.
I think the fix is to initialize module with Py_None and currently there's no need to bump the refcount (though I'm always wary of doing that in case things change in the future). If that sounds right I'm happy to put the fix in, else Antoine can do it :)
|
msg230546 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2014-11-03 16:43 |
(Looks like I was outsmarted by the tracker when trying to put line numbers in my links... is there a way to do that?)
|
msg231632 - (view) |
Author: Zachary Ware (zach.ware) *  |
Date: 2014-11-25 04:35 |
This is still causing a somewhat serious warning on Windows, see [1] for example. The condition warned about affects every platform.
It took me some time to make sense of that function, but I think I finally understand what's going on. I think Steve's suggestion of initializing module to Py_None would be fine, or just pass in a known good object like global instead since the 'obj' parameter is only used for error message formatting. Either way, I think the error check at Modules/_pickle.c:1657 [2] should either use reformat_attribute_error() or do the reformatting itself (to provide a different message) because I don't think the AttributeError message from get_dotted_path would be anywhere close to correct (especially if 'obj' is given as Py_None).
[1] http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/8969/steps/compile/logs/stdio
[2] Steve: the 'Comment:' label in the form is a handy link to the devguide section about how magic links work. In this case, it's filepath.ext:number
|
msg231829 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2014-11-28 21:08 |
On Windows with Visual Studio, I got a compiler warning. In whichmodule(), get_dotted_path() is called with module whereas module is not initialiazed:
dotted_path = get_dotted_path(module, global_name, allow_qualname);
|
msg231836 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2014-11-28 22:50 |
Here is a patch.
|
msg231876 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2014-11-30 01:32 |
That patch looks good to me (better than my fix would have been :) )
|
msg231929 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2014-12-01 10:59 |
This issue may have introduce a regression, please see:
Issue #22971: test_pickle: "Fatal Python error: Cannot recover from stack overflow." on FreeBSD buildbot
|
msg231966 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2014-12-01 23:20 |
New changeset 3e3bec66409c by Antoine Pitrou in branch 'default':
Fix uninitialized variable after #22676.
https://hg.python.org/cpython/rev/3e3bec66409c
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:09 | admin | set | github: 66866 |
2014-12-01 23:20:33 | pitrou | set | status: open -> closed resolution: fixed stage: commit review -> resolved |
2014-12-01 23:20:15 | python-dev | set | messages:
+ msg231966 |
2014-12-01 10:59:16 | vstinner | set | messages:
+ msg231929 |
2014-11-30 01:32:50 | steve.dower | set | messages:
+ msg231876 |
2014-11-28 22:50:04 | pitrou | set | files:
+ whichmodule_error.patch
messages:
+ msg231836 |
2014-11-28 21:08:32 | vstinner | set | messages:
+ msg231829 |
2014-11-25 04:35:15 | zach.ware | set | status: closed -> open
nosy:
+ zach.ware messages:
+ msg231632
resolution: fixed -> (no value) stage: resolved -> commit review |
2014-11-03 16:43:57 | steve.dower | set | messages:
+ msg230546 |
2014-11-03 16:43:16 | steve.dower | set | nosy:
+ steve.dower messages:
+ msg230545
|
2014-10-23 21:00:24 | pitrou | set | status: open -> closed resolution: fixed messages:
+ msg229904
stage: needs patch -> resolved |
2014-10-23 20:52:52 | python-dev | set | nosy:
+ python-dev messages:
+ msg229900
|
2014-10-21 08:22:11 | georg.brandl | set | messages:
+ msg229759 |
2014-10-21 08:17:49 | pitrou | set | files:
+ whichmodule.patch
messages:
+ msg229758 |
2014-10-21 08:17:18 | pitrou | set | messages:
+ msg229757 |
2014-10-21 08:13:37 | vstinner | set | messages:
+ msg229756 |
2014-10-21 08:10:32 | vstinner | set | messages:
+ msg229755 |
2014-10-21 08:08:56 | pitrou | set | messages:
+ msg229754 |
2014-10-21 08:05:32 | pitrou | set | messages:
+ msg229753 |
2014-10-21 07:44:02 | georg.brandl | set | files:
+ whichmodule_speedup.diff keywords:
+ patch messages:
+ msg229752
|
2014-10-21 07:29:44 | pitrou | set | messages:
+ msg229751 |
2014-10-21 07:20:56 | pitrou | set | title: Creating the string representation of a module is slower -> _pickle's whichmodule() is slow stage: needs patch messages:
+ msg229749 versions:
+ Python 3.5, - Python 3.4 |
2014-10-21 07:14:08 | georg.brandl | set | nosy:
+ georg.brandl messages:
+ msg229748
|
2014-10-21 01:22:19 | kbengine | set | status: pending -> open
messages:
+ msg229746 |
2014-10-20 15:02:26 | brett.cannon | set | status: open -> pending title: _pickle.c -> Creating the string representation of a module is slower nosy:
+ brett.cannon
messages:
+ msg229727
|
2014-10-20 12:21:35 | vstinner | set | nosy:
+ vstinner
|
2014-10-20 12:16:31 | serhiy.storchaka | set | nosy:
+ pitrou, alexandre.vassalotti
|
2014-10-20 11:38:22 | kbengine | create | |