classification
Title: test_multiprocessing.test_notify_all() hangs on "AMD64 Snow Leopard 02 03.x"
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: neologix, python-dev, vstinner
Priority: normal Keywords:

Created on 2011-12-09 12:58 by vstinner, last changed 2012-01-02 12:34 by neologix. This issue is now closed.

Files
File name Uploaded Description Edit
test_backlog.py neologix, 2011-12-19 09:16
Messages (7)
msg149089 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-09 12:58
[333/363] test_multiprocessing
Timeout (1:00:00)!
Thread 0x0000000112d0b000:
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/connection.py", line 411 in _recv
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/connection.py", line 432 in _recv_bytes
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/connection.py", line 275 in recv
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/managers.py", line 758 in _callmethod
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/managers.py", line 994 in wait
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/test/test_multiprocessing.py", line 734 in f
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/threading.py", line 682 in run
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/threading.py", line 729 in _bootstrap_inner
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/threading.py", line 702 in _bootstrap

Thread 0x0000000112908000:
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/connection.py", line 411 in _recv
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/connection.py", line 432 in _recv_bytes
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/connection.py", line 275 in recv
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/managers.py", line 758 in _callmethod
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/managers.py", line 994 in wait
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/test/test_multiprocessing.py", line 734 in f
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/threading.py", line 682 in run
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/threading.py", line 729 in _bootstrap_inner
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/threading.py", line 702 in _bootstrap

Thread 0x00007fff7022ccc0:
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/connection.py", line 411 in _recv
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/connection.py", line 432 in _recv_bytes
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/connection.py", line 275 in recv
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/managers.py", line 758 in _callmethod
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/multiprocessing/managers.py", line 982 in acquire
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/test/test_multiprocessing.py", line 833 in test_notify_all
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/unittest/case.py", line 385 in _executeTestPart
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/unittest/case.py", line 440 in run
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/unittest/case.py", line 492 in __call__
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/unittest/suite.py", line 105 in run
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/unittest/suite.py", line 67 in __call__
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/unittest/suite.py", line 105 in run
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/unittest/suite.py", line 67 in __call__
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/unittest/suite.py", line 105 in run
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/unittest/suite.py", line 67 in __call__
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/unittest/runner.py", line 168 in run
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/test/support.py", line 1368 in _run_suite
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/test/support.py", line 1402 in run_unittest
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/test/test_multiprocessing.py", line 2392 in test_main
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/test/regrtest.py", line 1221 in runtest_inner
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/test/regrtest.py", line 907 in runtest
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/test/regrtest.py", line 710 in main
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/test/__main__.py", line 13 in <module>
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/runpy.py", line 73 in _run_code
  File "/Users/buildbot/buildarea/3.x.parc-snowleopard-1/build/Lib/runpy.py", line 160 in _run_module_as_main
make: *** [buildbottest] Error 1

command timed out: 3900 seconds without output, attempting to kill
program finished with exit code 2
elapsedTime=9486.934561
msg149824 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-12-19 09:16
I think this could be due to the multiprocessing manager's server socket backlog value, which is a little too low: by default, it's set to 5, and the tests launch up to 3 threads and 3 processes in parallel, so if we're unlucky with the scheduling, we could get some ECONNREFUSED.
Unless otherwise specified, the server uses Unix domain sockets: on Linux, when the server's socket backlog is full, connect() blocks, which could explain why it doesn't happen on Linux. It would be nice to check the behavior in case of socket backlog full on affected OSes (for example OS X or FreeBSD).

Here's a run on Linux:
"""
$ ./python ~/test_backlog.py 
0
1
2
3
4
[blocks]
"""

If we get ECONNREFUSED on OS X or FreeBSD, then there's a chance it's the culprit. If not, well, no idea what's going on :-)
msg150120 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-12-22 23:01
Victor, could you try the attached script on FreeBSD, to see if you get ECONNREFUSED?
msg150189 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-23 17:49
> Victor, could you try the attached script on FreeBSD,
> to see if you get ECONNREFUSED?

Yes, I get a ECONNREFUSED. I tested backlog.py on FreeBSD 8.2.
msg150190 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-12-23 18:10
New changeset 94494a779c20 by Charles-François Natali in branch '2.7':
Issue #13565: Increase multiprocessing's server socket backlog, to avoid
http://hg.python.org/cpython/rev/94494a779c20

New changeset 9b99adef3c78 by Charles-François Natali in branch '3.2':
Issue #13565: Increase multiprocessing's server socket backlog, to avoid
http://hg.python.org/cpython/rev/9b99adef3c78

New changeset 29cad1ac828c by Charles-François Natali in branch 'default':
Issue #13565: Increase multiprocessing's server socket backlog, to avoid
http://hg.python.org/cpython/rev/29cad1ac828c
msg150191 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-12-23 18:23
> Yes, I get a ECONNREFUSED. I tested backlog.py on FreeBSD 8.2.

Thanks.
I bumped the backlog, I hope it will fix this.
We can leave this report open for a couple days, to see how the buildbots behave.
msg150453 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-01-02 12:34
Alright, it seems to be fixed.
We can still reopen if this happens again.
History
Date User Action Args
2012-01-02 12:34:53neologixsetstatus: open -> closed
resolution: fixed
messages: + msg150453

stage: resolved
2011-12-23 18:23:02neologixsetmessages: + msg150191
2011-12-23 18:10:53python-devsetnosy: + python-dev
messages: + msg150190
2011-12-23 17:49:35vstinnersetmessages: + msg150189
2011-12-22 23:01:14neologixsetmessages: + msg150120
2011-12-19 09:16:54neologixsetfiles: + test_backlog.py
nosy: + neologix
messages: + msg149824

2011-12-09 12:58:58vstinnercreate