classification
Title: Segfault with deep recursion into object().__dir__
Type: crash Stage:
Components: Interpreter Core Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: a-j-buxton, augustogoulart, methane, pablogsal, rhettinger, ronaldoussoren, serhiy.storchaka, taleinat, terry.reedy, vstinner
Priority: normal Keywords: patch

Created on 2018-06-21 11:27 by a-j-buxton, last changed 2018-11-30 12:54 by methane.

Files
File name Uploaded Description Edit
bpo-33930.patch augustogoulart, 2018-11-27 12:19
Messages (18)
msg320167 - (view) Author: Alistair Buxton (a-j-buxton) Date: 2018-06-21 11:27
The following small snippet of code will crash 3.6.5 with a segfault.

The crash occurs at cleanup, so it won't happen in the interactive interpreter (until you exit).


# --- code ---

o = object()
for x in range(1000000):
    o = o.__dir__
    print(x, id(o.__dir__))

# --- end code ---
msg320263 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-06-22 20:28
Serhiy, is this one of the cases we decided to leave as won't fix?
msg320413 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-06-25 10:58
The crash is likely caused by recursion during the clean-up of the object().__dir__.__dir__.... chain. 

The trashcan API (see Py_TRASHCAN_SAFE_BEGIN/Py_TRASHCAN_SAFE_END in Include/object.h) can help here. 

Those do some real work and have a cost, I can't predict if that cost is substational enough to affect benchmarks (method objects are destroyed a lot).
msg320418 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-06-25 12:15
The trashcan mechanism is used for slot wrappers. The following code doesn't crash:

o = object()
for x in range(1000000):
    o = o.__hash__
msg320458 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-06-26 00:13
Alistair, how did you happen upon this case?
msg320478 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-26 08:14
C traceback in gdb:

Program received signal SIGSEGV, Segmentation fault.
0x00000000004812a8 in meth_dealloc (
    m=<error reading variable: Cannot access memory at address 0x7fffff7fefd8>)
    at Objects/methodobject.c:86
86	{
(gdb) where
#0  0x00000000004812a8 in meth_dealloc (
    m=<error reading variable: Cannot access memory at address 0x7fffff7fefd8>)
    at Objects/methodobject.c:86
#1  0x0000000000488d9f in _Py_Dealloc (
    op=<built-in method __dir__ of builtin_function_or_method object at remote 0x7fffe3973490>) at Objects/object.c:1931
#2  0x00000000004813c4 in meth_dealloc (m=0x7fffe3973580)
    at Objects/methodobject.c:91
#3  0x0000000000488d9f in _Py_Dealloc (
    op=<built-in method __dir__ of builtin_function_or_method object at remote 0x7fffe3973508>) at Objects/object.c:1931
#4  0x00000000004813c4 in meth_dealloc (m=0x7fffe39735f8)
    at Objects/methodobject.c:91
#5  0x0000000000488d9f in _Py_Dealloc (
    op=<built-in method __dir__ of builtin_function_or_method object at remote 0x7fffe3973580>) at Objects/object.c:1931
#6  0x00000000004813c4 in meth_dealloc (m=0x7fffe3973670)
    at Objects/methodobject.c:91
#7  0x0000000000488d9f in _Py_Dealloc (
    op=<built-in method __dir__ of builtin_function_or_method object at remote 0x7fffe39735f8>) at Objects/object.c:1931
#8  0x00000000004813c4 in meth_dealloc (m=0x7fffe39736e8)
    at Objects/methodobject.c:91
(...)
msg320504 - (view) Author: Alistair Buxton (a-j-buxton) Date: 2018-06-26 16:55
I found this while chatting on IRC about the implementation details of objects - specifically, since objects have a dict of attributes, and dict subclasses object, it seems like every dict will have another dict inside it. (Eventually someone explained that base object() doesn't have any state at all, as a result dict doesn't have a dict of attributes, and there is no cycle.)

Anyway, as I was trying to explain my question to the other users, I had intended to post this snippet and ask "why doesn't this crash python?" Ironically it turned out that it does crash python.
msg320509 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-06-26 19:10
@victor: thanks for confirming my analysis.  

The trashcan API does fix this issue, although I did have to change "_PyObject_GC_UNTRACK(m);" to "PyObject_GC_UnTrack(m);" as well. I haven't checked yet why that's necessary.
msg329839 - (view) Author: Augusto Goulart (augustogoulart) * Date: 2018-11-13 13:13
ronaldoussoren, I have tested (Debian 9) substituting "_PyObject_GC_UNTRACK(m);" to "PyObject_GC_UnTrack(m);", and although the tests pass, I'm can still see the segfault error.

diff --git a/Objects/methodobject.c b/Objects/methodobject.c
index cfea8cf410..cb8e279c7d 100644
--- a/Objects/methodobject.c
+++ b/Objects/methodobject.c
@@ -84,7 +84,7 @@ PyCFunction_GetFlags(PyObject *op)
 static void
 meth_dealloc(PyCFunctionObject *m)
 {
-    _PyObject_GC_UNTRACK(m);
+    PyObject_GC_UnTrack(m);
     if (m->m_weakreflist != NULL) {
         PyObject_ClearWeakRefs((PyObject*) m);
     }
msg329846 - (view) Author: Augusto Goulart (augustogoulart) * Date: 2018-11-13 14:30
Sorry, I forgot to include the trashcan API. I'll double check and update the issue.
msg330516 - (view) Author: Augusto Goulart (augustogoulart) * Date: 2018-11-27 12:19
After a couple of rounds, this is the benchmark result I got for the attached patch. It seems to be a simple, non-performance degrading fix.

 python -m perf compare_to origin.json patch.json -G --min-speed=2
Slower (1):
- sqlalchemy_declarative: 213 ms +- 4 ms -> 219 ms +- 9 ms: 1.03x slower (+3%)

Faster (9):
- logging_format: 15.1 us +- 0.3 us -> 14.5 us +- 0.3 us: 1.05x faster (-5%)
- hexiom: 15.0 ms +- 0.5 ms -> 14.5 ms +- 0.2 ms: 1.03x faster (-3%)
- json_loads: 39.6 us +- 0.7 us -> 38.5 us +- 0.5 us: 1.03x faster (-3%)
- json_dumps: 17.2 ms +- 0.4 ms -> 16.7 ms +- 0.2 ms: 1.03x faster (-3%)
- fannkuch: 634 ms +- 4 ms -> 619 ms +- 6 ms: 1.02x faster (-2%)
- django_template: 183 ms +- 3 ms -> 178 ms +- 5 ms: 1.02x faster (-2%)
- float: 153 ms +- 2 ms -> 150 ms +- 1 ms: 1.02x faster (-2%)
- raytrace: 761 ms +- 19 ms -> 745 ms +- 8 ms: 1.02x faster (-2%)
- regex_effbot: 3.69 ms +- 0.06 ms -> 3.61 ms +- 0.05 ms: 1.02x faster (-2%)

Benchmark hidden because not significant (47): 2to3, chaos, crypto_pyaes, deltablue, dulwich_log, go, html5lib, logging_silent, logging_simple, mako, meteor_contest, nbody, nqueens, pathlib, pickle, pickle_dict, pickle_list, pickle_pure_python, pidigits, python_startup, python_startup_no_site, regex_compile, regex_dna, regex_v8, richards, scimark_fft, scimark_lu, scimark_monte_carlo, scimark_sor, scimark_sparse_mat_mult, spectral_norm, sqlalchemy_imperative, sqlite_synth, sympy_expand, sympy_integrate, sympy_sum, sympy_str, telco, tornado_http, unpack_sequence, unpickle, unpickle_list, unpickle_pure_python, xml_etree_parse, xml_etree_iterparse, xml_etree_generate, xml_etree_process
msg330520 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-27 13:59
-    _PyObject_GC_UNTRACK(m);
+    PyObject_GC_UnTrack(m);

Can someone please explain me why this change has any impact? Does it mean that _PyObject_GC_UNTRACK() macro has a bug?

The function is quite straighforward :-)

void
PyObject_GC_Track(void *op)
{
    _PyObject_GC_TRACK(op);
}
msg330563 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-11-28 00:07
I don’t understand this either.  For some reason the patch works when using the function and not when using the macro. 

I haven’t looked into this yet, and am currently traveling. I should have time to look into this starting December 5th when I’m back home.
msg330579 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-28 08:27
It is hard to get a noticeable difference in macrobenchmarks. I want to see results of microbenchmarks.

For tuples, the trashcan mechanism is the main culprit of the performance hit when use short-living tuples for passing arguments. For bound method objects the relative effect should be smaller, because looking up the method is more costly than building a tuple, and less bound method objects are created after implementing the method call optimization. But in any case we should know the maximal cost of this change before merging it.
msg330635 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-11-28 22:53
> Can someone please explain me why this change has any impact? Does it mean that _PyObject_GC_UNTRACK() macro has a bug?

Interestingly, this lonely and perplexing comment is left in  PyObject_GC_UnTrack (the function):

void
PyObject_GC_UnTrack(void *op)
{
     /* Obscure:  the Py_TRASHCAN mechanism requires that we be able to
      * call PyObject_GC_UnTrack twice on an object.
      */
  if (_PyObject_GC_IS_TRACKED(op)) {
      _PyObject_GC_UNTRACK(op);
  }
}
msg330776 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-11-30 11:57
microbench:


$ ./python.patched -m perf timeit --compare-to ./python.master -s 'o="foo"' 'o.encode'
python.master: ..................... 29.3 ns +- 0.6 ns
python.patched: ..................... 30.9 ns +- 1.3 ns

Mean +- std dev: [python.master] 29.3 ns +- 0.6 ns -> [python.patched] 30.9 ns +- 1.3 ns: 1.05x slower (+5%)



And this is perf report of python.patched:


  33.17%   _PyEval_EvalFrameDefault
  15.25%   _PyObject_GenericGetAttrWithDict
   9.63%   PyCFunction_NewEx
   8.25%   _PyType_Lookup
   6.42%   meth_dealloc
   3.50%   _Py_bytes_contains
   3.22%   PyObject_GC_UnTrack
   3.20%   method_get
   2.73%   PyObject_GetAttr
   2.55%   repeat_next
   0.88%   _Py_Dealloc
   0.81%   PyObject_GenericGetAttr
   0.78%   sre_ucs1_match
   0.50%   lookdict_unicode_nodummy
msg330780 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-30 12:17
Thank you Inada. There is a large overhead for iterating. Try to add --duplicate=1000.
msg330785 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-11-30 12:54
$ ./python.patched -m perf timeit --compare-to ./python.master -s 'o="foo"' 'o.encode' --duplicate=10000
python.master: ..................... 23.1 ns +- 0.5 ns
python.patched: ..................... 24.6 ns +- 0.5 ns

Mean +- std dev: [python.master] 23.1 ns +- 0.5 ns -> [python.patched] 24.6 ns +- 0.5 ns: 1.06x slower (+6%)

$ perf record ./python.patched -m perf timeit -s 'o="foo"' 'o.encode' --duplicate=10000
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
.....................
Mean +- std dev: 24.9 ns +- 0.7 ns
[ perf record: Woken up 9 times to write data ]
[kernel.kallsyms] with build id debb7f6ce8aad7263cc6564650a88db5c8003389 not found, continuing without symbols
[ perf record: Captured and wrote 2.491 MB perf.data (61787 samples) ]

$ perf report | cat
(snip)
# Total Lost Samples: 0
#
# Samples: 61K of event 'cycles:ppp'
# Event count (approx.): 52880893198
#
# Overhead  Command         Shared Object                                     Symbol
# ........  ..............  ................................................  ..........................................
#
    18.78%  python.patched  python.patched                                    [.] _PyEval_EvalFrameDefault
    14.85%  python.patched  python.patched                                    [.] _PyObject_GenericGetAttrWithDict
     9.93%  python.patched  python.patched                                    [.] PyCFunction_NewEx
     8.56%  python.patched  python.patched                                    [.] _PyType_Lookup
     6.61%  python.patched  python.patched                                    [.] meth_dealloc
     4.12%  python.patched  python.patched                                    [.] PyParser_AddToken
     3.36%  python.patched  python.patched                                    [.] PyObject_GetAttr
     3.33%  python.patched  python.patched                                    [.] PyObject_GC_UnTrack
     3.25%  python.patched  python.patched                                    [.] method_get
     2.92%  python.patched  python.patched                                    [.] _Py_bytes_contains
     2.13%  python.patched  python.patched                                    [.] freechildren
     1.41%  python.patched  python.patched                                    [.] pymalloc_alloc.isra.6.part.7
History
Date User Action Args
2018-11-30 12:54:04methanesetmessages: + msg330785
2018-11-30 12:17:21serhiy.storchakasetmessages: + msg330780
2018-11-30 11:57:25methanesetnosy: + methane
messages: + msg330776
2018-11-28 22:53:22pablogsalsetmessages: + msg330635
2018-11-28 08:27:46serhiy.storchakasetmessages: + msg330579
2018-11-28 00:07:59ronaldoussorensetmessages: + msg330563
2018-11-27 13:59:26vstinnersetmessages: + msg330520
2018-11-27 12:19:41augustogoulartsetfiles: + bpo-33930.patch
keywords: + patch
messages: + msg330516
2018-11-13 14:30:38augustogoulartsetmessages: + msg329846
2018-11-13 13:14:57augustogoulartsetnosy: + taleinat
2018-11-13 13:13:34augustogoulartsetnosy: + augustogoulart
messages: + msg329839
2018-08-14 22:02:17pablogsalsetnosy: + pablogsal
2018-06-26 19:10:54ronaldoussorensetmessages: + msg320509
2018-06-26 16:55:00a-j-buxtonsetmessages: + msg320504
2018-06-26 08:14:23vstinnersetmessages: + msg320478
2018-06-26 00:13:31rhettingersetnosy: + rhettinger
messages: + msg320458
2018-06-25 12:58:00vstinnersetnosy: + vstinner
2018-06-25 12:15:00serhiy.storchakasetmessages: + msg320418
2018-06-25 10:58:56ronaldoussorensetnosy: + ronaldoussoren
messages: + msg320413
2018-06-22 20:28:45terry.reedysetnosy: + serhiy.storchaka, terry.reedy
messages: + msg320263
2018-06-21 11:27:57a-j-buxtoncreate