classification
Title: 100000 assignments of .__sizeof__ cause a segfault on del
Type: crash Stage:
Components: Interpreter Core Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, WildCard65, christian.heimes, ronaldoussoren, serhiy.storchaka, terry.reedy, vstinner, xxm
Priority: normal Keywords:

Created on 2021-01-11 07:38 by xxm, last changed 2021-01-19 17:07 by vstinner.

Messages (10)
msg384797 - (view) Author: Xinmeng Xia (xxm) Date: 2021-01-11 07:38
In the following program 1, method  "__sizeof__()" is called and assigned multiple times. The program can work well on Python 3.10. However if I change "__sizeof__()" to  "__sizeof__".  Then a segmentation fault is reported. I think something wrong for the parser when dealing build-in attribute assignment.



program 1: 
=========================
mystr  = "hello123"
for x in range(1000000):
    mystr = mystr.__sizeof__()
    print(mystr)
=========================
56
28
28
.......
28
28

Output: work well as expected.


program 2: 
==========================
mystr = "hello123"
for x in range(1000000):
        mystr = mystr.__sizeof__
        print(mystr)
==========================
<built-in method __sizeof__ of builtin_function_or_method object at 0x7f04d3e0c220>
......
<built-in method __sizeof__ of builtin_function_or_method object at 0x7f04d3e0c4f0>
<built-in method __sizeof__ of builtin_function_or_method object at 0x7f04d3e0c540>
Segmentation fault (core dumped)

Expected output: no segfault.
msg384801 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-01-11 08:06
I can reproduce the issue. The stack trace is several hundred thousand (!) levels deep.

#0  _Py_DECREF (op=<built-in method __sizeof__ of builtin_function_or_method object at remote 0x7fffe60703b0>, lineno=514, filename=0x6570af "./Include/object.h")
    at ./Include/object.h:448
#1  _Py_XDECREF (op=<built-in method __sizeof__ of builtin_function_or_method object at remote 0x7fffe60703b0>) at ./Include/object.h:514
#2  meth_dealloc (m=0x7fffe6070470) at Objects/methodobject.c:170
#3  0x0000000000466a99 in _Py_Dealloc (op=<optimized out>) at Objects/object.c:2209
#4  0x00000000005da2fa in _Py_DECREF (op=<optimized out>, lineno=514, filename=0x6570af "./Include/object.h") at ./Include/object.h:448
#5  _Py_XDECREF (op=<optimized out>) at ./Include/object.h:514
#6  meth_dealloc (m=0x7fffe60704d0) at Objects/methodobject.c:170
#7  0x0000000000466a99 in _Py_Dealloc (op=<optimized out>) at Objects/object.c:2209
#8  0x00000000005da2fa in _Py_DECREF (op=<optimized out>, lineno=514, filename=0x6570af "./Include/object.h") at ./Include/object.h:448
#9  _Py_XDECREF (op=<optimized out>) at ./Include/object.h:514
#10 meth_dealloc (m=0x7fffe6070530) at Objects/methodobject.c:170
#11 0x0000000000466a99 in _Py_Dealloc (op=<optimized out>) at Objects/object.c:2209
#12 0x00000000005da2fa in _Py_DECREF (op=<optimized out>, lineno=514, filename=0x6570af "./Include/object.h") at ./Include/object.h:448
#13 _Py_XDECREF (op=<optimized out>) at ./Include/object.h:514
#14 meth_dealloc (m=0x7fffe6070590) at Objects/methodobject.c:170
#15 0x0000000000466a99 in _Py_Dealloc (op=<optimized out>) at Objects/object.c:2209
#16 0x00000000005da2fa in _Py_DECREF (op=<optimized out>, lineno=514, filename=0x6570af "./Include/object.h") at ./Include/object.h:448
#17 _Py_XDECREF (op=<optimized out>) at ./Include/object.h:514
#18 meth_dealloc (m=0x7fffe60705f0) at Objects/methodobject.c:170
#19 0x0000000000466a99 in _Py_Dealloc (op=<optimized out>) at Objects/object.c:2209
#20 0x00000000005da2fa in _Py_DECREF (op=<optimized out>, lineno=514, filename=0x6570af "./Include/object.h") at ./Include/object.h:448
#21 _Py_XDECREF (op=<optimized out>) at ./Include/object.h:514
#22 meth_dealloc (m=0x7fffe6070650) at Objects/methodobject.c:170
...
#509737 _Py_XDECREF (op=<optimized out>) at ./Include/object.h:514
#509738 meth_dealloc (m=0x7fffe54ca6b0) at Objects/methodobject.c:170
#509739 0x0000000000466a99 in _Py_Dealloc (op=<optimized out>) at Objects/object.c:2209
#509740 0x00000000005da2fa in _Py_DECREF (op=<optimized out>, lineno=514, filename=0x6570af "./Include/object.h") at ./Include/object.h:448
#509741 _Py_XDECREF (op=<optimized out>) at ./Include/object.h:514
#509742 meth_dealloc (m=0x7fffe54ca710) at Objects/methodobject.c:170
#509743 0x0000000000466a99 in _Py_Dealloc (op=<optimized out>) at Objects/object.c:2209
#509744 0x00000000005da2fa in _Py_DECREF (op=<optimized out>, lineno=514, filename=0x6570af "./Include/object.h") at ./Include/object.h:448
#509745 _Py_XDECREF (op=<optimized out>) at ./Include/object.h:514
#509746 meth_dealloc (m=0x7fffe54ca770) at Objects/methodobject.c:170

...
msg384802 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2021-01-11 08:17
This is a recursion problem, "mystr" will be equivalent to 'hello123'.__sizeof__.__sizeof__. ...(100K repetition)... .__sizeof__.  The dealloc of "mystr" will cause recursive calls to tp_dealloc along the entire chain and that can exhaust the C stack.
msg385128 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-01-15 23:38
Xinmeng, to verify Ronald's explanation, run this instead

mystr  = "hello123"
for x in range(1000000):
    mystr = mystr.__sizeof__()
input('>')  # Hit Enter to continue.
del mystr   # Expect crash here.
input('<')  # And never get here.
msg385132 - (view) Author: Xinmeng Xia (xxm) Date: 2021-01-16 03:02
Thank you. But I am not sure this is a recursion problem. Please see the following example, I replace "__sizeof__" with "__class__". No segmentation fault. Everything goes well.

========================
mystr  = "hello123"
print(dir(mystr))
for x in range(1000000):
    mystr = mystr.__class__
    print(mystr)
=========================
and

=========================
mystr  = "hello123"
for x in range(1000000):
    mystr = mystr.__class__
input('>')  # Hit Enter to continue.
del mystr   # Expect crash here.
input('<')  # And never get here
=========================
No segmentation fault
msg385134 - (view) Author: William Pickard (WildCard65) * Date: 2021-01-16 04:25
Jumping in here to explain why '__class' doesn't crash when '__sizeof__' does:

When '__class__' is fetched, it returns a new reference to the object's type.

When '__sizeof__' is fetched on the otherhand, a new object is allocated on the heap ('types.MethodType') and is returned to the caller.

This object also has a '__sizeof__' that does the same (as it's implemented on 'object'.

So yes, you are exhausting the C runtime stack by de-allocating over a THOUSAND objects.

You can see this happen by watching the memory usage of Python steadily climb.
msg385177 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2021-01-18 08:33
Note that there is a way to avoid this crash using the trashcan API (see the use of Py_TRASHCAN_BEGIN in various implementation).  This API is generally only used for recursive data structures and because it has a performance cost (based on what I've read in other issues).
msg385187 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-01-18 10:54
Yes, there is an overhead of using the trashcan mechanism. This is why it is only used in data collections, because it is expected that your data can contain arbitrary long chains of links. There is many ways to create arbitrary long chains with other objects, but it does not happen in common code. For methods the cost would be especially high, because method objects are usually short-lived and the performance of creating/destroying is critical.

AFAIK the same issue (maybe not with __sizeof__, but with other method of the basic object class, like __reduce__) was already reported earlier. I propose to close  this issue as "won't fix".
msg385265 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-01-19 15:01
Mark, would your proposal in PEP-651 fix this case?
msg385267 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-01-19 15:11
It won't solve the problem.
Maybe make it would make it easier to avoid the segfault, but some sort of recursion/overflow check is needed. 

It might make the use of the trashcan cheaper, as it only need be used when stack space is running low.

Ultimately, the cycle GC needs to be iterative, rather than recursive. That will take a *lot* of work though.
History
Date User Action Args
2021-01-19 17:07:10vstinnersetnosy: + vstinner
2021-01-19 15:11:28Mark.Shannonsetmessages: + msg385267
2021-01-19 15:01:53terry.reedysetnosy: + Mark.Shannon
messages: + msg385265
2021-01-18 10:54:03serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg385187
2021-01-18 08:33:47ronaldoussorensetmessages: + msg385177
2021-01-16 04:25:41WildCard65setnosy: + WildCard65
messages: + msg385134
2021-01-16 03:02:46xxmsetmessages: + msg385132
2021-01-15 23:38:39terry.reedysetnosy: + terry.reedy

messages: + msg385128
title: Multiple assignments of attribute "__sizeof__" will cause a segfault -> 100000 assignments of .__sizeof__ cause a segfault on del
2021-01-11 08:17:41ronaldoussorensetnosy: + ronaldoussoren
messages: + msg384802
2021-01-11 08:06:17christian.heimessetnosy: + christian.heimes
messages: + msg384801
2021-01-11 07:38:04xxmcreate