classification
Title: PyObject_Call() is called with an exception set in subprocess
Type: crash Stage: resolved
Components: Extension Modules Versions: Python 3.6, Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: martin.panter, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2015-11-29 19:09 by serhiy.storchaka, last changed 2015-12-11 06:04 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
preexec-fork.patch martin.panter, 2015-11-30 02:20 review
preexec-fork.2.patch martin.panter, 2015-12-01 02:43 review
Messages (15)
msg255586 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-29 19:09
http://buildbot.python.org/all/builders/x86%20OpenIndiana%203.x/builds/10847/steps/test/logs/stdio

[117/400] test_subprocess -- running: test_tokenize (65 sec), test_concurrent_futures (38 sec)
Assertion failed: !PyErr_Occurred(), file Objects/abstract.c, line 2158
Fatal Python error: Aborted

Current thread 0x00000001 (most recent call first):
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/subprocess.py", line 1483 in _execute_child
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/subprocess.py", line 950 in __init__
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/test_subprocess.py", line 2295 in test_close_fds_after_preexec
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/unittest/case.py", line 600 in run
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/unittest/case.py", line 648 in __call__
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/unittest/suite.py", line 122 in run
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/unittest/suite.py", line 84 in __call__
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/unittest/suite.py", line 122 in run
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/unittest/suite.py", line 84 in __call__
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/unittest/runner.py", line 176 in run
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/support/__init__.py", line 1780 in _run_suite
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/support/__init__.py", line 1814 in run_unittest
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/test_subprocess.py", line 2657 in test_main
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/libregrtest/runtest.py", line 162 in runtest_inner
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/libregrtest/runtest.py", line 115 in runtest
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/libregrtest/runtest_mp.py", line 69 in run_tests_slave
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/libregrtest/main.py", line 379 in main
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/libregrtest/main.py", line 433 in main
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/libregrtest/main.py", line 455 in main_in_temp_cwd
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/regrtest.py", line 39 in <module>
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/runpy.py", line 85 in _run_code
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/runpy.py", line 170 in _run_module_as_main
Traceback (most recent call last):
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/__main__.py", line 3, in <module>
    regrtest.main_in_temp_cwd()
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/libregrtest/main.py", line 455, in main_in_temp_cwd
    main()
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/libregrtest/main.py", line 433, in main
    Regrtest().main(tests=tests, **kwargs)
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/libregrtest/main.py", line 392, in main
    self.run_tests()
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/libregrtest/main.py", line 354, in run_tests
    run_tests_multiprocess(self)
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/libregrtest/runtest_mp.py", line 212, in run_tests_multiprocess
    raise Exception(msg)
Exception: Child error on test_subprocess: Exit code -6
make: *** [buildbottest] Error 1
program finished with exit code 2
msg255604 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-30 02:20
This looks like an intermittent (1 in ~5) failure in at least the Open Indiana 3.5 and 3.x buildbots.

My guess is that fork() has failed, and because preexec_fn is in use, the code tries to call gc.enable() with the fork() exception already set. My patch adds a test and fixes that.

Victor already fixed a similar error for the cleanup branch of subprocess_fork_exec() in Issue 22290.
msg255613 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-11-30 08:18
> Assertion failed: !PyErr_Occurred(), file Objects/abstract.c, line 2158

Quick reminder: it's not a major bug if PyObject_Call() is called with an exception set. The exception will be kept in some cases. But I added the assertion because the exception is lost in some other cases. Example: calling hasattr(obj, name) replaces the current exception with an AttributeError and then clears the AttributeError exception if the object doesn't have the attribute.

> My guess is that fork() has failed,

Oh good job, it's exactly that: I'm able to reproduce the bug with your unit test (from your attached patch).

If I recall correctly, there is no overallocation on Solaris (OpenIndiana), and this buildbot is running low memory (MemoryError is very common). fork() can likely fail under low memory.

> and because preexec_fn is in use, the code tries to call gc.enable() with the fork() exception already set. My patch adds a test and fixes that.

Hum, I think that your patch can be simplified. Replace:

    if (pid == -1) {
        PyErr_SetFromErrno(PyExc_OSError);
    ...
    if (need_to_reenable_gc && _enable_gc(gc_module)) ...

with:

    if (need_to_reenable_gc && _enable_gc(gc_module)) ...
    ...
    if (pid == -1) {
        PyErr_SetFromErrno(PyExc_OSError);

What do you think?

> Victor already fixed a similar error for the cleanup branch of subprocess_fork_exec() in Issue 22290.

I reproduced the issue on Linux with pyfailmalloc.

You can play with pyfailmalloc to try to find similar bugs. The Bitbucket repository contains the latest debug_cpython.patch which adds some more checks to make the analysis faster and more effective, and debug_cpython.gdb contains commands to run it.

https://bitbucket.org/haypo/pyfailmalloc
msg255614 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-30 10:17
Couldn’t re-enabling garbage collection cause errno to be overwritten? Even if I saved errno until later, there is still the RuntimeError from releasing the import lock to worry about.

Now that I look closer, I see the lock release code path sets this exception, but does not return NULL. Another bug.
msg255637 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-12-01 02:43
This patch fixes the exception when releasing the import lock. I also moved the Py_XDECREF(gc_module) back out to where it was to keep the patch to a minimum, but I don’t really see a way to simplify it much more.

Apparently Solaris supports posix_spawn(), which doesn’t require double the memory like fork(). But implementing that would be a significant project, and was rejected in Issue 20104.
msg255915 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-12-05 02:35
New changeset 4f4e2cbd2138 by Martin Panter in branch '3.4':
Issue #25764: Preserve subprocess fork exception when preexec_fn used
https://hg.python.org/cpython/rev/4f4e2cbd2138

New changeset ae27ad306dbf by Martin Panter in branch '3.5':
Issue #25764: Merge subprocess fix from 3.4 into 3.5
https://hg.python.org/cpython/rev/ae27ad306dbf

New changeset b10c58a740b9 by Martin Panter in branch 'default':
Issue #25764: Merge subprocess fix from 3.5
https://hg.python.org/cpython/rev/b10c58a740b9
msg255917 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-12-05 03:53
OS X buildbots don’t like my resource change.

http://buildbot.python.org/all/builders/AMD64%20Snow%20Leop%203.4/builds/1377/steps/test/logs/stdio

It seems that the setrlimit() cleanup line is failing, all the subsequent tests fail to fork(). I don’t understand why it would fail to restore the original soft limit. The exception is “ValueError: not allowed to raise maximum limit” (corresponding to EPERM).

All I can think of is the soft limit is reported as RLIM_INFINITY, but maybe setrlimit() does not accept that if the hard limit is finite. Failing that, I will have to skip the test before reducing the soft limit if setting the original limit fails.
msg255921 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-12-05 04:18
New changeset f53958873fae by Martin Panter in branch 'default':
Issue #25764: Attempt to debug and skip OS X setrlimit() failure
https://hg.python.org/cpython/rev/f53958873fae
msg255925 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-12-05 05:48
New changeset ccf42cdffc6d by Martin Panter in branch 'default':
Issue #25764: OS X now failing on the second setrlimit() call
https://hg.python.org/cpython/rev/ccf42cdffc6d
msg255930 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-05 07:59
This may be related to issue9917.
msg255934 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-12-05 08:16
I am inclined to just skip the test on OS X, unless some OS X angel wants to help me investigate this :)

@skipIf(sys.platform == "darwin", "setrlimit() seems to fail on OS X")
msg255941 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-12-05 09:30
Thanks for the pointer Serhiy (I missed it before my last comment). On OS X, it looks like rlim_t is 64-bit unsigned, and RLIM_INFINITY is 2**63 - 1 (0x7FF . . .). Looking over the /Modules/resource.c code, I can see potential problems with it, but cannot imagine how they would cause the behaviour I was getting.
msg255950 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-12-05 12:43
New changeset 6f831de45f43 by Martin Panter in branch '3.4':
Issue #25764: Skip the test on OS X
https://hg.python.org/cpython/rev/6f831de45f43

New changeset 6211c41106cc by Martin Panter in branch '3.5':
Issue #25764: Merge OS X test skipping from 3.4 into 3.5
https://hg.python.org/cpython/rev/6211c41106cc

New changeset 9a847520c40d by Martin Panter in branch 'default':
Issue #25764: Merge OS X test skipping from 3.5
https://hg.python.org/cpython/rev/9a847520c40d

New changeset 0c9095566f21 by Martin Panter in branch 'default':
Issue #25764: Remove test debugging
https://hg.python.org/cpython/rev/0c9095566f21
msg256005 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-12-06 06:21
Here is a better failure in test_close_fds_after_preexec(), raising ENOMEM, so I am fairly confident that my fix is working and the exception from fork() is passing through properly.

http://buildbot.python.org/all/builders/AMD64%20OpenIndiana%203.5/builds/137/steps/test/logs/stdio

======================================================================
ERROR: test_close_fds_after_preexec (test.test_subprocess.POSIXProcessTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/export/home/buildbot/64bits/3.5.cea-indiana-amd64/build/Lib/test/test_subprocess.py", line 2313, in test_close_fds_after_preexec
    preexec_fn=lambda: os.dup2(1, fd))
  File "/export/home/buildbot/64bits/3.5.cea-indiana-amd64/build/Lib/subprocess.py", line 950, in __init__
    restore_signals, start_new_session)
  File "/export/home/buildbot/64bits/3.5.cea-indiana-amd64/build/Lib/subprocess.py", line 1483, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Not enough space

Closing this as fixed, but I would love to find out what the problem with my setrlimit() test code is on OS X.
msg256199 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-12-11 06:04
New changeset fe844253cd44 by Martin Panter in branch '3.5':
Issue #25764: Skip fork failure test when run as root
https://hg.python.org/cpython/rev/fe844253cd44

New changeset 2f9541cab936 by Martin Panter in branch 'default':
Issue #25764: Merge fix for root user from 3.5
https://hg.python.org/cpython/rev/2f9541cab936
History
Date User Action Args
2015-12-11 06:04:45python-devsetmessages: + msg256199
2015-12-06 06:21:07martin.pantersetstatus: open -> closed
resolution: fixed
messages: + msg256005

stage: patch review -> resolved
2015-12-05 12:43:09python-devsetmessages: + msg255950
2015-12-05 09:30:07martin.pantersetmessages: + msg255941
2015-12-05 08:16:40martin.pantersetmessages: + msg255934
2015-12-05 07:59:23serhiy.storchakasetmessages: + msg255930
2015-12-05 05:48:06python-devsetmessages: + msg255925
2015-12-05 04:18:13python-devsetmessages: + msg255921
2015-12-05 03:53:49martin.pantersetmessages: + msg255917
2015-12-05 02:35:38python-devsetnosy: + python-dev
messages: + msg255915
2015-12-01 02:43:44martin.pantersetfiles: + preexec-fork.2.patch

messages: + msg255637
2015-11-30 10:17:13martin.pantersetmessages: + msg255614
2015-11-30 08:18:40vstinnersetmessages: + msg255613
2015-11-30 02:20:33martin.pantersetfiles: + preexec-fork.patch

nosy: + martin.panter
messages: + msg255604

keywords: + patch
stage: patch review
2015-11-29 19:09:11serhiy.storchakacreate