New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault with deep recursion into object().__dir__ #78111
Comments
The following small snippet of code will crash 3.6.5 with a segfault. The crash occurs at cleanup, so it won't happen in the interactive interpreter (until you exit). # --- code --- o = object()
for x in range(1000000):
o = o.__dir__
print(x, id(o.__dir__)) # --- end code --- |
Serhiy, is this one of the cases we decided to leave as won't fix? |
The crash is likely caused by recursion during the clean-up of the object().__dir__.__dir__.... chain. The trashcan API (see Py_TRASHCAN_SAFE_BEGIN/Py_TRASHCAN_SAFE_END in Include/object.h) can help here. Those do some real work and have a cost, I can't predict if that cost is substational enough to affect benchmarks (method objects are destroyed a lot). |
The trashcan mechanism is used for slot wrappers. The following code doesn't crash: o = object()
for x in range(1000000):
o = o.__hash__ |
Alistair, how did you happen upon this case? |
C traceback in gdb: Program received signal SIGSEGV, Segmentation fault. |
I found this while chatting on IRC about the implementation details of objects - specifically, since objects have a dict of attributes, and dict subclasses object, it seems like every dict will have another dict inside it. (Eventually someone explained that base object() doesn't have any state at all, as a result dict doesn't have a dict of attributes, and there is no cycle.) Anyway, as I was trying to explain my question to the other users, I had intended to post this snippet and ask "why doesn't this crash python?" Ironically it turned out that it does crash python. |
@victor: thanks for confirming my analysis. The trashcan API does fix this issue, although I did have to change "_PyObject_GC_UNTRACK(m);" to "PyObject_GC_UnTrack(m);" as well. I haven't checked yet why that's necessary. |
ronaldoussoren, I have tested (Debian 9) substituting "_PyObject_GC_UNTRACK(m);" to "PyObject_GC_UnTrack(m);", and although the tests pass, I'm can still see the segfault error. diff --git a/Objects/methodobject.c b/Objects/methodobject.c
index cfea8cf410..cb8e279c7d 100644
--- a/Objects/methodobject.c
+++ b/Objects/methodobject.c
@@ -84,7 +84,7 @@ PyCFunction_GetFlags(PyObject *op)
static void
meth_dealloc(PyCFunctionObject *m)
{
- _PyObject_GC_UNTRACK(m);
+ PyObject_GC_UnTrack(m);
if (m->m_weakreflist != NULL) {
PyObject_ClearWeakRefs((PyObject*) m);
} |
Sorry, I forgot to include the trashcan API. I'll double check and update the issue. |
After a couple of rounds, this is the benchmark result I got for the attached patch. It seems to be a simple, non-performance degrading fix. python -m perf compare_to origin.json patch.json -G --min-speed=2
Faster (9):
Benchmark hidden because not significant (47): 2to3, chaos, crypto_pyaes, deltablue, dulwich_log, go, html5lib, logging_silent, logging_simple, mako, meteor_contest, nbody, nqueens, pathlib, pickle, pickle_dict, pickle_list, pickle_pure_python, pidigits, python_startup, python_startup_no_site, regex_compile, regex_dna, regex_v8, richards, scimark_fft, scimark_lu, scimark_monte_carlo, scimark_sor, scimark_sparse_mat_mult, spectral_norm, sqlalchemy_imperative, sqlite_synth, sympy_expand, sympy_integrate, sympy_sum, sympy_str, telco, tornado_http, unpack_sequence, unpickle, unpickle_list, unpickle_pure_python, xml_etree_parse, xml_etree_iterparse, xml_etree_generate, xml_etree_process |
Can someone please explain me why this change has any impact? Does it mean that _PyObject_GC_UNTRACK() macro has a bug? The function is quite straighforward :-) void
PyObject_GC_Track(void *op)
{
_PyObject_GC_TRACK(op);
} |
I don’t understand this either. For some reason the patch works when using the function and not when using the macro. I haven’t looked into this yet, and am currently traveling. I should have time to look into this starting December 5th when I’m back home. |
It is hard to get a noticeable difference in macrobenchmarks. I want to see results of microbenchmarks. For tuples, the trashcan mechanism is the main culprit of the performance hit when use short-living tuples for passing arguments. For bound method objects the relative effect should be smaller, because looking up the method is more costly than building a tuple, and less bound method objects are created after implementing the method call optimization. But in any case we should know the maximal cost of this change before merging it. |
Interestingly, this lonely and perplexing comment is left in PyObject_GC_UnTrack (the function): void
PyObject_GC_UnTrack(void *op)
{
/* Obscure: the Py_TRASHCAN mechanism requires that we be able to
* call PyObject_GC_UnTrack twice on an object.
*/
if (_PyObject_GC_IS_TRACKED(op)) {
_PyObject_GC_UNTRACK(op);
}
} |
microbench: $ ./python.patched -m perf timeit --compare-to ./python.master -s 'o="foo"' 'o.encode'
python.master: ..................... 29.3 ns +- 0.6 ns
python.patched: ..................... 30.9 ns +- 1.3 ns Mean +- std dev: [python.master] 29.3 ns +- 0.6 ns -> [python.patched] 30.9 ns +- 1.3 ns: 1.05x slower (+5%) And this is perf report of python.patched: 33.17% _PyEval_EvalFrameDefault |
Thank you Inada. There is a large overhead for iterating. Try to add --duplicate=1000. |
$ ./python.patched -m perf timeit --compare-to ./python.master -s 'o="foo"' 'o.encode' --duplicate=10000
python.master: ..................... 23.1 ns +- 0.5 ns
python.patched: ..................... 24.6 ns +- 0.5 ns Mean +- std dev: [python.master] 23.1 ns +- 0.5 ns -> [python.patched] 24.6 ns +- 0.5 ns: 1.06x slower (+6%) $ perf record ./python.patched -m perf timeit -s 'o="foo"' 'o.encode' --duplicate=10000
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
.....................
Mean +- std dev: 24.9 ns +- 0.7 ns
[ perf record: Woken up 9 times to write data ]
[kernel.kallsyms] with build id debb7f6ce8aad7263cc6564650a88db5c8003389 not found, continuing without symbols
[ perf record: Captured and wrote 2.491 MB perf.data (61787 samples) ]
$ perf report | cat
(snip)
# Total Lost Samples: 0
#
# Samples: 61K of event 'cycles:ppp'
# Event count (approx.): 52880893198
#
# Overhead Command Shared Object Symbol
# ........ .............. ................................................ ..........................................
#
18.78% python.patched python.patched [.] _PyEval_EvalFrameDefault
14.85% python.patched python.patched [.] _PyObject_GenericGetAttrWithDict
9.93% python.patched python.patched [.] PyCFunction_NewEx
8.56% python.patched python.patched [.] _PyType_Lookup
6.61% python.patched python.patched [.] meth_dealloc
4.12% python.patched python.patched [.] PyParser_AddToken
3.36% python.patched python.patched [.] PyObject_GetAttr
3.33% python.patched python.patched [.] PyObject_GC_UnTrack
3.25% python.patched python.patched [.] method_get
2.92% python.patched python.patched [.] _Py_bytes_contains
2.13% python.patched python.patched [.] freechildren
1.41% python.patched python.patched [.] pymalloc_alloc.isra.6.part.7 |
I've reproduced the segfault on 3.11 on a Mac. |
Correctness matters more than performance for 1 nanosecond. IMO a fix like attached bpo-33930.patch looks like the good approach to fix the crash. |
I'm thinking that the explicit call to PyObject_GC_UnTrack should be made unnecessary by integrating it into the trashcan code. That way, we avoid someone else running into this kind of bug in the future. See bpo-44881. |
Thanks for the fix Augusto Goulart and Pablo! I close the issue. Pablo asked in PR 27678 if a Python 3.8 Release Manager (Lukasz) wants a backport to 3.8. |
Apparently we've got ourselves refleaks with the patches. It doesn't always reproduce but put test_exceptions -R: in a loop in 3.10 and soon enough you'll get: 0:00:00 load avg: 4.47 Run tests sequentially == Tests result: FAILURE == 1 test failed: It's much harder to reproduce on 3.9 and 3.8 because test_exceptions has fewer tests and at least on my MBP finishes twice as fast. I ran 3.9 in a loop for an hour and no refleak. |
Interestingly, I can't reproduce the refleaks on |
I gotta run for the day. The shortest way for me to reproduce is to focus on the tests that were added in 3.10 and shortening the repetitions a little: $ time ./python.exe -E -Wd -m test test_exceptions -R3:3 -m 'test_attribute*' -m 'test_getattr*' -m 'test_lineno*' -m 'test_name_error*' -m 'test_recursion*' -m 'test_no_hang*' 0:00:00 load avg: 5.92 Run tests sequentially == Tests result: FAILURE == 1 test failed: 1 re-run test: Total duration: 11.8 sec |
(BTW, with this command, it is indeed possible to trigger the ref leak on |
It seems to have been introduced by I don't see how the code change could have introduced a leak. Maybe the new tests exposed a pre-existing leak? |
Entirely possible, Irit. We've had just this sort of case just last week! See: https://bugs.python.org/issue44524#msg399157 |
I've narrowed it down a bit more and created bpo-44895. |
Since the refleak investigation moved to its own issue, I'm re-closing this one for clarity. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: