classification
Title: test_multiprocessing_fork hangs test_subprocess
Type: Stage: resolved
Components: Tests Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pablogsal Nosy List: benjamin.peterson, hroncok, pablogsal, pitrou, vstinner
Priority: normal Keywords:

Created on 2017-09-14 06:36 by benjamin.peterson, last changed 2018-06-27 20:39 by vstinner. This issue is now closed.

Messages (15)
msg302143 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-09-14 06:36
This hangs reliably for me on master:

$ ./python -m test.regrtest test_multiprocessing_fork test_subprocess

It looks like test_multiprocessing_fork doesn't clean up some of its subprocesses and then test_subprocess hangs on waitpid(0) forever.
msg315486 - (view) Author: Miro Hrončok (hroncok) * Date: 2018-04-19 11:17
This started to bother us in Fedora for various Python versions, so chances are something changed on the system level.

However for us it seams test_multiprocessing_fork hangs by itself, so this might or might not be relevant to 


# python3.7 -m test.regrtest test_multiprocessing_fork test_subprocess
Run tests sequentially
0:00:00 load avg: 1.16 [1/2] test_multiprocessing_fork
/usr/lib64/python3.7/multiprocessing/semaphore_tracker.py:55: UserWarning: semaphore_tracker: process died unexpectedly, relaunching.  Some semaphores might leak.
  warnings.warn('semaphore_tracker: process died unexpectedly, '
Exception in thread Thread-26:
Traceback (most recent call last):
  File "/usr/lib64/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.7/multiprocessing/resource_sharer.py", line 139, in _serve
    signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
  File "/usr/lib64/python3.7/signal.py", line 60, in pthread_sigmask
    sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range

(hangs)

^CProcess Process-184:
Traceback (most recent call last):
  File "/usr/lib64/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib64/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.7/test/_test_multiprocessing.py", line 3328, in child_access
    w = conn.recv()
  File "/usr/lib64/python3.7/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "/usr/lib64/python3.7/multiprocessing/connection.py", line 960, in rebuild_connection
    fd = df.detach()
  File "/usr/lib64/python3.7/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/usr/lib64/python3.7/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/usr/lib64/python3.7/multiprocessing/connection.py", line 498, in Client
    answer_challenge(c, authkey)
  File "/usr/lib64/python3.7/multiprocessing/connection.py", line 741, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
  File "/usr/lib64/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib64/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib64/python3.7/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
Warning -- multiprocessing.process._dangling was modified by test_multiprocessing_fork
  Before: <_weakrefset.WeakSet object at 0x7f4b63782dd8>
  After:  <_weakrefset.WeakSet object at 0x7f4b637821d0> 
Warning -- threading._dangling was modified by test_multiprocessing_fork
  Before: <_weakrefset.WeakSet object at 0x7f4b63782780>
  After:  <_weakrefset.WeakSet object at 0x7f4b63782da0> 

Test suite interrupted by signal SIGINT.
2 tests omitted:
    test_multiprocessing_fork test_subprocess

Total duration: 1 min 35 sec
Tests result: INTERRUPTED

# python3.7 --version
Python 3.7.0b3


# python3.6 -m test.regrtest test_multiprocessing_fork test_subprocess
Run tests sequentially
0:00:00 load avg: 1.84 [1/2] test_multiprocessing_fork
/usr/lib64/python3.6/multiprocessing/semaphore_tracker.py:55: UserWarning: semaphore_tracker: process died unexpectedly, relaunching.  Some semaphores might leak.
  warnings.warn('semaphore_tracker: process died unexpectedly, '
Exception in thread Thread-20:
Traceback (most recent call last):
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.6/multiprocessing/resource_sharer.py", line 139, in _serve
    signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
  File "/usr/lib64/python3.6/signal.py", line 60, in pthread_sigmask
    sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range

(hangs)

^CProcess Process-178:
Traceback (most recent call last):
...
KeyboardInterrupt
Warning -- multiprocessing.process._dangling was modified by test_multiprocessing_fork
  Before: <_weakrefset.WeakSet object at 0x7f1873917c88>
  After:  <_weakrefset.WeakSet object at 0x7f188516d978> 
Warning -- threading._dangling was modified by test_multiprocessing_fork
  Before: <_weakrefset.WeakSet object at 0x7f1873917d68>
  After:  <_weakrefset.WeakSet object at 0x7f1873917e48> 

Test suite interrupted by signal SIGINT.
2 tests omitted:
    test_multiprocessing_fork test_subprocess

Total duration: 51 sec
Tests result: INTERRUPTED

# python3.6 --version
Python 3.6.5


# python3.5 -m test.regrtest test_multiprocessing_fork test_subprocess
0:00:00 load avg: 1.83 [1/2] test_multiprocessing_fork
Exception in thread Thread-20:
Traceback (most recent call last):
  File "/usr/lib64/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
    signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
  File "/usr/lib64/python3.5/signal.py", line 60, in pthread_sigmask
    sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range

(hangs)

^CProcess Process-178:
Traceback (most recent call last):
...
KeyboardInterrupt
Warning -- multiprocessing.process._dangling was modified by test_multiprocessing_fork
  Before: <_weakrefset.WeakSet object at 0x7f2bb3923ba8>
  After:  <_weakrefset.WeakSet object at 0x7f2bc4d88b38> 
Warning -- threading._dangling was modified by test_multiprocessing_fork
  Before: <_weakrefset.WeakSet object at 0x7f2bb3923be0>
  After:  <_weakrefset.WeakSet object at 0x7f2bb3923f28> 

Test suite interrupted by signal SIGINT.
2 tests omitted:
    test_multiprocessing_fork test_subprocess
Tests result: INTERRUPTED

# python3.5 --version
Python 3.5.4

# python3.4 -m test.regrtest test_multiprocessing_fork test_subprocess
[1/2] test_multiprocessing_fork
Exception in thread Thread-21:
Traceback (most recent call last):
  File "/usr/lib64/python3.4/threading.py", line 911, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.4/threading.py", line 859, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.4/multiprocessing/resource_sharer.py", line 139, in _serve
    signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
ValueError: signal number 32 out of range

(hangs)

^CTraceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib64/python3.4/multiprocessing/forkserver.py", line 170, in main
Warning -- multiprocessing.process._dangling was modified by test_multiprocessing_fork
Warning -- threading._dangling was modified by test_multiprocessing_fork
Process Process-180:
Traceback (most recent call last):
...
KeyboardInterrupt

Test suite interrupted by signal SIGINT.
2 tests omitted:
    test_multiprocessing_fork test_subprocess

# python3.4 --version
Python 3.4.8
msg315487 - (view) Author: Miro Hrončok (hroncok) * Date: 2018-04-19 11:19
Sorry, I've pressed the button before finishing the thought.

...so this might or might not be relevant to what you observe with master and originally reported.
msg315580 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-04-21 20:17
@Benjamin

I can't reproduce here.  It's true that test_multiprocessing_fork leaves two processes being itself, but that doesn't seem to disturb test_subprocess.

@Miro

Could you report the signal number issue separately?  Thank you!
msg315583 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-04-21 20:34
It doesn't hang for me anymore, but does print this warning:

0:00:00 load avg: 1.59 [1/2] test_multiprocessing_fork
Lib/multiprocessing/semaphore_tracker.py:55: UserWarning: semaphore_tracker: process died unexpectedly, relaunching.  Some semaphores might leak.
  warnings.warn('semaphore_tracker: process died unexpectedly, '
msg315584 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-04-21 20:41
There's a test that deliberately kills the semaphore tracker, so that warning should be expected (I don't see it myself, but that might be because of differing warning settings).
msg315591 - (view) Author: Miro Hrončok (hroncok) * Date: 2018-04-22 08:08
> Could you report the signal number issue separately?  Thank you!

https://bugs.python.org/issue33329

Resetting the Versions back.
msg317382 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-23 10:37
> It looks like test_multiprocessing_fork doesn't clean up some of its subprocesses and then test_subprocess hangs on waitpid(0) forever.

Oh strange. This issue was supposed to be fixed. I spent a lot of time to fix such bug, but also make sure that not test leak any child process.

"test_subprocess hangs on waitpid(0) forever" I recall a similar bug that I fixed recently.

I spent time on test.support.reap_children() and test.support.threading_cleanup() to make sure that tests don't leak threads no child processes. It seems like I missed some tests.

In Lib/test/libregrtest/runtest.py, I added:

def post_test_cleanup():
    support.reap_children()

which is run after each test.

But currently, support.reap_children() calls os.waitpid(-1, os.WNOHANG) to check for child processes which already completed. It doesn't warn if there are child processes which are still running.

During my tests, I modified reap_children() to add a sleep(1).

Maybe on Linux we could use a cgroup to really check for all child processes?
msg317399 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-05-23 12:52
Let's take another look at the issue: why does test_subprocess need to know about all child processes, rather than those that were launched during test_subprocess?
msg320064 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-06-20 10:46
I would like to work on this issue
msg320065 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-20 10:50
Last time I was bitten by waitpid():

commit 11045c9d8a21dd9bd182a3939189db02815f9783
Author: Victor Stinner <victor.stinner@gmail.com>
Date:   Thu Oct 5 06:32:53 2017 -0700

    bpo-31178: Mock os.waitpid() in test_subprocess (#3896)

    Fix test_exception_errpipe_bad_data() and
    test_exception_errpipe_normal() of test_subprocess: mock os.waitpid()
    to avoid calling the real os.waitpid(0, 0) which is an unexpected
    side effect of the test.
msg320066 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-20 10:51
See also the now hopefully fixed bpo-31509 "test_subprocess hangs randomly on AMD64 Windows10 3.x".
msg320067 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-20 10:54
Antoine: "Let's take another look at the issue: why does test_subprocess need to know about all child processes, rather than those that were launched during test_subprocess?"

It seems like this bug has been fixed by bpo-31178, commit 11045c9d8a21dd9bd182a3939189db02815f9783, no?
msg320068 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-20 10:54
Another old (now fixed) issue of test_subprocess: bpo-31173 "test_subprocess: test_child_terminated_in_stopped_state() leaks a zombie process".
msg320617 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-27 20:39
> This hangs reliably for me on master:
> $ ./python -m test.regrtest test_multiprocessing_fork test_subprocess

FYI right now, this command doesn't hang anymore.

> ValueError: signal number 32 out of range

This bug has been fixed by bpo-33332.

> Let's take another look at the issue: why does test_subprocess need to know about all child processes, rather than those that were launched during test_subprocess?

This issue should be fixed by my commit 11045c9d8a21dd9bd182a3939189db02815f9783.

It seems like all issues described here have been fixed and so I close the issue. If an issue is not properly fixed, maybe it's better to open a new issue since this one has long history and described multiple unrelated bugs.
History
Date User Action Args
2018-06-27 20:39:46vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg320617

stage: resolved
2018-06-20 10:54:39vstinnersetmessages: + msg320068
2018-06-20 10:54:08vstinnersetmessages: + msg320067
2018-06-20 10:51:04vstinnersetmessages: + msg320066
2018-06-20 10:50:09vstinnersetmessages: + msg320065
2018-06-20 10:46:53pablogsalsetmessages: + msg320064
2018-06-20 10:17:31pablogsalsetassignee: pablogsal

nosy: + pablogsal
2018-05-23 12:52:53pitrousetmessages: + msg317399
2018-05-23 10:37:18vstinnersetmessages: + msg317382
2018-05-03 12:19:43vstinnersetnosy: + vstinner
2018-04-22 08:08:49hroncoksetmessages: + msg315591
versions: - Python 3.4, Python 3.5, Python 3.6
2018-04-21 20:41:59pitrousetmessages: + msg315584
2018-04-21 20:34:29benjamin.petersonsetmessages: + msg315583
2018-04-21 20:17:41pitrousetnosy: + pitrou
messages: + msg315580
2018-04-19 11:19:16hroncoksetmessages: + msg315487
2018-04-19 11:17:22hroncoksetnosy: + hroncok

messages: + msg315486
versions: + Python 3.4, Python 3.5, Python 3.6
2017-09-14 06:36:00benjamin.petersoncreate