msg157576 - (view) |
Author: Per Myren (progrper) |
Date: 2012-04-05 12:28 |
The following code crashes with a segfault on Python 2.7.2:
from operator import add
from itertools import izip, starmap
a = b = [1]
for i in xrange(100000):
a = starmap(add, izip(a, b))
list(a)
It also crashes with Python 3.2.2:
from operator import add
from itertools import starmap
a = b = [1]
for i in range(100000):
a = starmap(add, zip(a, b))
list(a)
|
msg157578 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-04-05 12:53 |
Apparently it's a stack overflow between zip_next and starmap_next.
|
msg157699 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2012-04-06 22:00 |
Win7, 64 bit, IDLE shell and editor, 3.2.3c2, 24 gb machine, added print():
I got no response for about 10 secs until a process restart happens, which I believe means that the background pythonw process stopped.
If I hit ^C before that, I get the Windows segfault equivalent, a process stopped window. Then a restart, without the normal red 'keyboard interrupt' message.
|
msg157700 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2012-04-06 22:03 |
3.3.0a2, command prompt window:
'python has stopped working' in about 2 secs.
|
msg158317 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2012-04-15 10:42 |
You are creating a 100000 level nested structure of iterators. It is no wonder that you exhaust the stack space of the interpreter. You would get the same with any iterator combination, nothing special with zip and starmap here.
I can't see that anything can be done about this, unless we can create some sort of non-recursive iternext slot, which returns an evaluation context, rather than an item, similar to what stackless does.
|
msg158321 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-04-15 11:58 |
Kristján, we already have provisions to avoid stack overflows, instead bailing out with a RuntimeError. So this is a bug.
|
msg158326 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2012-04-15 13:14 |
The "RuntimeError: maximum recursion depth exceeded" message is normally only triggered by pure Python recursion, so I would not have expected it here, but there should be some sort of graceful MemoryError or somesuch rather than a segfault.
The following code narrows it down to some issue in starmap():
def gstarmap(func, iterable):
for tup in iterable:
yield func(*tup)
def mylist(iterable):
return [x for x in iterable]
a = b = [1]
for i in xrange(100000):
# Things that trigger a segfault:
#a = starmap(add, izip(a, b))
#a = starmap(add, iter( (a, b) ))
a = starmap(add, (a, b) )
# Things that exit cleanly with a RuntimeError
#a = gstarmap(add, iter( (a, b) ))
#a = (x+y for x, y in iter( (a, b) ))
mylist(a)
One possibility may be that starmap.next needs to clear StopIteration exceptions in the same way as PyIter_Next():
if (result == NULL &&
PyErr_Occurred() &&
PyErr_ExceptionMatches(PyExc_StopIteration))
PyErr_Clear();
|
msg158331 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2012-04-15 14:14 |
Hmm, substituting PyIter_Next() didn't help.
There isn't much else being done in starmap.next, just a call to:
result = PyObject_Call(lz->func, args, NULL);
I'm now wondering if starmap() is tickling a bug in PyObject_Call, perhaps memory being allocated but not checked for NULL or somesuch.
|
msg158335 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-04-15 14:38 |
> I'm now wondering if starmap() is tickling a bug in PyObject_Call,
> perhaps memory being allocated but not checked for NULL or somesuch.
The issue is that the code paths involved here circumvent recursion checking, so the stack blows up.
|
msg158337 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2012-04-15 14:43 |
a = map(add, a, b) also crashes this.
a = chain(a, b) also.
If, by "provisions" you speak of sys.max_recursion_depth, that is only invoked when executing "python" code. What's happening here is just simple c recursion trough function pointers, ending in stack overflow, at least on Windows:
> python33_d.dll!chain_next(chainobject * lz) Line 1811 + 0x6 bytes C
python33_d.dll!PyIter_Next(_object * iter) Line 2741 + 0xf bytes C
python33_d.dll!chain_next(chainobject * lz) Line 1823 + 0xc bytes C
python33_d.dll!PyIter_Next(_object * iter) Line 2741 + 0xf bytes C
python33_d.dll!chain_next(chainobject * lz) Line 1823 + 0xc bytes C
python33_d.dll!PyIter_Next(_object * iter) Line 2741 + 0xf bytes C
|
msg158338 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-04-15 14:53 |
> a = map(add, a, b) also crashes this.
> a = chain(a, b) also.
> If, by "provisions" you speak of sys.max_recursion_depth, that is only
> invoked when executing "python" code.
There's nothing that prevents it from protecting C code.
|
msg158342 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2012-04-15 16:47 |
[Kristján]
> a = map(add, a, b) also crashes this.
> ... What's happening here is just simple c recursion
> trough function pointers, ending in stack overflow, ...
Thanks for the analysis. ISTM, this bug report is getting less and less interesting (or at least, less actionable without heavy-handed interventions in multiple tools).
One other thought, the OPs isn't really recursive in the sense of a function calling itself repeatedly. Instead, the OPs explicitly creates a heavily nested pile of distinct iterator objects and then runs the entire chain. This isn't much different from someone writing: os.system('cat somefile | ' + ' | '.join(['sort']*100000)).
The existing sys.max_recursion_depth was put in as a defense against the relatively common mistake of users writing a recursive function and getting the termination code wrong. I don't think that logic would apply to intentionally deeply nested data structures or iterators.
Stackoverflows in C are hard to protect against. We could take every iterator and set some limits on it, but that would be heavy handed and likely do more harm than good (C iterators have been around almost a decade and haven't done fine in the wild. The itertools in particular were designed to gain speed through by-passing the eval-loop. Slowing them down would be counter to their primary use case.)
|
msg158343 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-04-15 17:07 |
> The existing sys.max_recursion_depth was put in as a defense against
> the relatively common mistake of users writing a recursive function
> and getting the termination code wrong. I don't think that logic
> would apply to intentionally deeply nested data structures or
> iterators.
Well, we have a history of trying to fix crashers, even when they only
occur in weird cases.
> Stackoverflows in C are hard to protect against.
We could simply re-use the existing infrastructure.
|
msg158350 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2012-04-15 18:32 |
[Raymond: I presume you meant that C iterators have not been a problem in the wild and have done fine.]
The RuntimeError message "maximum recursion depth exceeded" is not exactly correct. As Kristján implied in his first message, what has been reached is the maximum call stack or nested call depth. It happens to be that recursive calls are the easiest way to do that, but Python makes it somewhat easy to dynamically generate thousands of different callables making thousands of non-recursive nested calls. (A static expression with even 100 nested calls fails compilation with a MemoryError (3.2.3).)
|
msg158352 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-04-15 18:48 |
> It happens to be that recursive calls are the easiest way to do that,
> but Python makes it somewhat easy to dynamically generate thousands of
> different callables making thousands of non-recursive nested calls.
That's a rather pointless discussion of terminology, IMHO.
|
msg158357 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2012-04-15 19:25 |
On the other hand, Antoine is correct in that we _could_ use the existing infrastructure and count PyIter_Next() as a recursion in the same way as entering the ceval loop is a recursion. Extra checking in there would hardly slow down the execution much. (but it would have to do its job only when invoking a "c" implemented iternext routine...)
I tried to come up with other ways that we could create deeply nested C function calls...
Here's one:
...
a = (a, a)
b = (b, b)
a < b
This however gets caught by this code:
if (Py_EnterRecursiveCall(" in comparison"))
return NULL;
res = do_richcompare(v, w, op);
Py_LeaveRecursiveCall();
So obviously someone thought this could be an issue.
However:
...
a = {1: a}
repr(a)
will generate the same crash. (tuple repr and list repr have guards, dict repr not)
So, there are various ways to get c recursion to overflow the C stack. Some of them have been patched throughout the years, others not.
We could try to identify all the different ways. We could for example add guards in PyObject_Repr() rather than individual types. But I'm sure we will leave holes in place like the OP discovered with deeply nested iterators.
|
msg158359 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2012-04-15 19:37 |
ISTM this would do more harm than good. An introduce a new requirement for all iterators, introducing new arbitrary limits and slowing down all iterators (this is currently a simple, fast, light-weight protocol).
Also this seems to be just a CPython issue (the JVM manages its own stack). Please don't muck-up the iterator protocol over this non-issue. It isn't worth it. If someone wants a stackless version of Python, they should use a stackless version of Python.
There are other crashers we choose to ignore (involving gc.getreferrers, bytecode hacks, ctypes, etc). I think this should go in that category and I would be happy to add a note to that effect in the docs for itertools.
|
msg158360 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2012-04-15 19:45 |
> (A static expression with even 100 nested calls fails compilation with a > MemoryError (3.2.3).)
I don't think that's at all related to this issue: that has to do with the fixed-size parser stack used when parsing Python code (see MAXSTACK in Parser/parser.h). Nothing to do with the C stack. :-)
|
msg158361 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2012-04-15 19:47 |
> There are other crashers we choose to ignore (involving gc.getreferrers, > bytecode hacks, ctypes, etc). I think this should go in that category
> and I would be happy to add a note to that effect in the docs for tertools.
Yes, including my previous example with repr()
a = None
for i in range(100000):
a = {1: a}
repr(a)
This is a case where care has been taken for lists, tuples, but not dicts. If we want to fix repr, the recursion checking shoudl probably go into PyObject_repr(). I'm not advocating for a fix, Just pointing out yet another way you can construct objects so that accessnig them will cause a crash.
|
msg192065 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-06-30 08:22 |
This looks as a duplicate of issue14010.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:28 | admin | set | github: 58712 |
2013-10-13 17:55:16 | georg.brandl | set | status: pending -> closed superseder: deeply nested itertools objects segfault resolution: later -> duplicate |
2013-08-06 12:18:46 | serhiy.storchaka | set | status: open -> pending |
2013-06-30 08:22:57 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg192065
|
2013-06-30 06:03:48 | terry.reedy | set | versions:
+ Python 3.4, - Python 3.2 |
2012-04-15 19:47:22 | kristjan.jonsson | set | messages:
+ msg158361 |
2012-04-15 19:45:34 | mark.dickinson | set | nosy:
+ mark.dickinson messages:
+ msg158360
|
2012-04-15 19:41:04 | pitrou | set | nosy:
+ vstinner
|
2012-04-15 19:37:20 | rhettinger | set | messages:
+ msg158359 |
2012-04-15 19:25:28 | kristjan.jonsson | set | messages:
+ msg158357 |
2012-04-15 18:48:20 | pitrou | set | messages:
+ msg158352 |
2012-04-15 18:32:55 | terry.reedy | set | messages:
+ msg158350 |
2012-04-15 17:07:50 | pitrou | set | messages:
+ msg158343 |
2012-04-15 16:47:42 | rhettinger | set | resolution: later messages:
+ msg158342 |
2012-04-15 15:57:28 | meador.inge | set | nosy:
+ meador.inge
|
2012-04-15 14:53:17 | pitrou | set | messages:
+ msg158338 |
2012-04-15 14:43:29 | kristjan.jonsson | set | messages:
+ msg158337 |
2012-04-15 14:38:02 | pitrou | set | messages:
+ msg158335 |
2012-04-15 14:14:45 | rhettinger | set | messages:
+ msg158331 |
2012-04-15 13:14:19 | rhettinger | set | priority: normal -> low
messages:
+ msg158326 title: Segfault with starmap and izip combo -> Segfault with deeply nested starmap calls |
2012-04-15 11:58:18 | pitrou | set | messages:
+ msg158321 |
2012-04-15 10:42:46 | kristjan.jonsson | set | nosy:
+ kristjan.jonsson messages:
+ msg158317
|
2012-04-14 04:18:20 | ezio.melotti | set | nosy:
+ ezio.melotti
stage: needs patch |
2012-04-06 22:03:50 | terry.reedy | set | messages:
+ msg157700 |
2012-04-06 22:00:29 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg157699
|
2012-04-06 02:03:59 | rhettinger | set | assignee: rhettinger |
2012-04-05 12:53:38 | pitrou | set | nosy:
+ rhettinger, pitrou
messages:
+ msg157578 versions:
+ Python 3.3 |
2012-04-05 12:28:37 | progrper | create | |