classification
Title: test_sendall_interrupted hangs on FreeBSD with a zombi multiprocessing thread
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: neologix, pitrou, python-dev, sbt, vstinner
Priority: normal Keywords: patch

Created on 2012-04-24 23:01 by vstinner, last changed 2012-04-28 20:33 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
mp_resource_sharer_stop.patch sbt, 2012-04-25 00:18 review
mp_resource_sharer_stop.patch sbt, 2012-04-25 11:37 review
mp_resource_sharer_stop.patch sbt, 2012-04-25 13:03 review
mp_resource_sharer_stop.patch sbt, 2012-04-26 15:27 review
Messages (17)
msg159230 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-04-24 23:01
[233/364] test_multiprocessing
...
[265/364] test_typechecks
[266/364] test_socket
Timeout (1:00:00)!
Thread 0x0000000807235000:
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/socket.py", line 135 in accept
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/multiprocessing/connection.py", line 595 in accept
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/multiprocessing/connection.py", line 469 in accept
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/multiprocessing/reduction.py", line 256 in _serve
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/threading.py", line 592 in run
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/threading.py", line 635 in _bootstrap_inner
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/threading.py", line 612 in _bootstrap

Thread 0x0000000801407400:
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/test_socket.py", line 1208 in check_sendall_interrupted
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/test_socket.py", line 1219 in test_sendall_interrupted
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/case.py", line 385 in _executeTestPart
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/case.py", line 440 in run
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/case.py", line 492 in __call__
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/suite.py", line 105 in run
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/suite.py", line 67 in __call__
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/suite.py", line 105 in run
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/suite.py", line 67 in __call__
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/unittest/runner.py", line 168 in run
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/support.py", line 1333 in _run_suite
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/support.py", line 1367 in run_unittest
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/test_socket.py", line 4813 in test_main
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/regrtest.py", line 1237 in runtest_inner
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/regrtest.py", line 907 in runtest
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/regrtest.py", line 710 in main
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/test/__main__.py", line 13 in <module>
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/runpy.py", line 73 in _run_code
  File "/usr/home/buildbot/buildarea/3.x.krah-freebsd/build/Lib/runpy.py", line 160 in _run_module_as_main
*** Error code 1

http://www.python.org/dev/buildbot/all/builders/AMD64%20FreeBSD%209.0%203.x/builds/2339/steps/test/logs/stdio
msg159231 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-04-24 23:26
There was a similar issue: #11753, but it was a bug in the faulthandler module. Here it looks like a bug in TestSocketSharing of test_socket which uses multiprocessing.
msg159232 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-24 23:34
Ah, this is because of the new daemon thread in ResourceSharer. That thread is never stopped and could receive signals while tests expect them to be delivered to the main thread.

Either we add a (private?) facility to stop that thread, or we block signal delivery in that thread using the signal module's pthread_sigmask. What do you think?
msg159233 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-24 23:36
The pthread_sigmask() solution would allow the use of multiprocessing all the while keeping deterministic signal delivery.
msg159241 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-04-25 00:18
This patch adds a ResourceSharer.stop() method.  This is called from tearDownClass() in the unittest.
msg159267 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-04-25 11:37
New version of patch which does

  signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))

in the thread (is that right?).

It also uses a timeout when trying to join the thread.
msg159271 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-25 12:14
> in the thread (is that right?).

This looks like it.

> It also uses a timeout when trying to join the thread.

Perhaps some kind of warning can be printed if joining fails after the timeout?
msg159279 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-04-25 13:03
Warning added to patch.
msg159284 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-25 13:39
Hmm, I thought either multiprocessing's logging facilities, or the warnings module, could be used. That way, people have a control over verbosity of stderr messages.
msg159286 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-04-25 13:57
mp_resource_sharer_stop.patch: this patch changes two different
things, the patch should be splitted. One patch to fix test_socket.
One patch to call pthread_sigmask().

I don't think that you should call pthread_sigmask(). It looks like a
workaround for this issue, whereas resource_sharer.stop() is the
correct fix.
msg159297 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-25 15:16
> I don't think that you should call pthread_sigmask(). It looks like a
> workaround for this issue, whereas resource_sharer.stop() is the
> correct fix.

The problem is not only with test_multiprocessing and test_socket; any test which uses multiprocessing could have side effects on any subsequent tests which uses signals. Also, applicative code could be affected.

So I think pthread_sigmask() *is* the solution.
msg159321 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-04-25 17:01
mp_resource_sharer_stop.patch: you should add a timeout argument to stop() instead of hardcoding a timeout of 5 seconds. It is maybe safer to block until the thread exits by default (so timeout=None by default).

For the new method: it may be nice to document it. Having to import resource_sharer from multiprocessing.reduction is maybe not the best possible API :-/

+        from multiprocessing.reduction import resource_sharer
+        resource_sharer.stop()

> Also, applicative code could be affected.

What is the effect of the patch? For example, on CTRL+c? I don't know the multiprocessing module nor this "resource sharer" thread.
msg159323 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-25 17:25
> For the new method: it may be nice to document it. Having to import
> resource_sharer from multiprocessing.reduction is maybe not the best
> possible API :-/

resource_sharer is a private API, it's not meant to be used by anyone
outside of the stdlib.

> What is the effect of the patch? For example, on CTRL+c?

Why should it have an effect on CTRL+c? Please explain yourself better.

> I don't know the multiprocessing module nor this "resource sharer"
> thread.

Time to learn about them perhaps :)
msg159382 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-04-26 15:27
New patch which adds timeout to ResourceSharer.stop() which defaults to 0.

When stop() fails it now uses the logger.

pthread_sigmask() only stops this background thread from receiving signals.  Signals will still be delivered to other threads, so it should not have any effect on the handling of Ctrl-C.
msg159499 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-04-27 21:52
New changeset f163c4731c58 by Antoine Pitrou in branch 'default':
Issue #14666: stop multiprocessing's resource-sharing thread after the tests are done.
http://hg.python.org/cpython/rev/f163c4731c58
msg159526 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-28 14:38
This should have fixed it. If now, someone reopen the issue :)
msg159537 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-04-28 20:33
> This should have fixed it. If now, someone reopen the issue :)

Thanks!
History
Date User Action Args
2012-04-28 20:33:54vstinnersetmessages: + msg159537
2012-04-28 14:38:48pitrousetstatus: open -> closed
resolution: fixed
messages: + msg159526

stage: needs patch -> resolved
2012-04-27 21:52:36python-devsetnosy: + python-dev
messages: + msg159499
2012-04-26 15:27:53sbtsetfiles: + mp_resource_sharer_stop.patch

messages: + msg159382
2012-04-25 17:25:58pitrousetmessages: + msg159323
2012-04-25 17:01:59vstinnersetmessages: + msg159321
2012-04-25 15:16:54pitrousetmessages: + msg159297
2012-04-25 13:57:16vstinnersetmessages: + msg159286
2012-04-25 13:39:23pitrousetmessages: + msg159284
2012-04-25 13:03:02sbtsetfiles: + mp_resource_sharer_stop.patch

messages: + msg159279
2012-04-25 12:14:38pitrousetmessages: + msg159271
2012-04-25 11:37:26sbtsetfiles: + mp_resource_sharer_stop.patch

messages: + msg159267
2012-04-25 00:19:00sbtsetfiles: + mp_resource_sharer_stop.patch
keywords: + patch
messages: + msg159241
2012-04-24 23:36:28pitrousetmessages: + msg159233
2012-04-24 23:34:53pitrousettype: behavior
components: + Library (Lib)
stage: needs patch
2012-04-24 23:34:18pitrousetnosy: + sbt
messages: + msg159232
2012-04-24 23:26:09vstinnersetmessages: + msg159231
2012-04-24 23:01:57vstinnercreate