classification
Title: test_multiprocessing failure
Type: behavior Stage: resolved
Components: Library (Lib), Tests Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: flox Nosy List: flox, jnoller, pitrou, skrah, vstinner
Priority: normal Keywords: buildbot, patch

Created on 2010-01-29 14:02 by pitrou, last changed 2011-04-05 00:06 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
issue7805_problem_outline.patch skrah, 2010-02-20 10:43 Just to show where the problem is.
Messages (11)
msg98509 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-29 14:02
This is a fresh py3k checkout on a fresh Debian Lenny install:

======================================================================
ERROR: test_pool_worker_lifetime (test.test_multiprocessing.WithProcessesTestPoolWorkerLifetime)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/antoine/py3k/py3k/Lib/test/test_multiprocessing.py", line 1076, in test_pool_worker_lifetime
    self.assertNotEqual(sorted(origworkerpids), sorted(finalworkerpids))
TypeError: unorderable types: NoneType() < int()

----------------------------------------------------------------------
msg98510 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-01-29 14:07
Confirmed.

I got a Py3k warning on 2.7 about "unorderable types".
msg98667 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-01 15:50
Also witnessed on one of the buildbots:

test test_multiprocessing failed -- Traceback (most recent call last):
  File "/home/pybot/buildarea/3.x.klose-debian-ia64/build/Lib/test/test_multiprocessing.py", line 1076, in test_pool_worker_lifetime
    self.assertNotEqual(sorted(origworkerpids), sorted(finalworkerpids))
TypeError: unorderable types: NoneType() < int()

http://www.python.org/dev/buildbot/3.x.stable/builders/ia64%20Ubuntu%203.x/builds/434/steps/test/logs/stdio
msg99270 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-02-12 16:04
I can reproduce it almost always under these conditions:

System: Ubuntu Intrepid 64-bit, running on the actual hardware.
CPU: Core 2 Duo.
Load: At least one core maxed out by another process.

On an empty machine I haven't reproduced it yet.
msg99271 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-12 16:08
Indeed, I forgot to mention that the Debian Lenny install I reproduce the bug on is a single core (virtual) machine.
msg99614 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-02-20 10:43
I found the problem. On a loaded machine, an attempt is made to
get the pid of the worker before the process has properly started.

There are several places where a sleep could be inserted. Perhaps
Process.start() should wait until the child's pid is not None?
msg100381 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-04 08:48
Thanks Stefan for the analysis.
This patch should fix the issue, and make the buildbots happy.
msg100395 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-04 16:13
Fixed with r78653 and r78654.
msg100428 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-04 22:42
Sometimes it triggers a different issue, with patch applied (ia64 Ubuntu trunk):

test_multiprocessing
test test_multiprocessing failed -- Traceback (most recent call last):
  File "/home/pybot/buildarea/trunk.klose-debian-ia64/build/Lib/test/test_multiprocessing.py", line 1075, in test_pool_worker_lifetime
    while countdown and not all(w.is_alive() for w in p._pool):
  File "/home/pybot/buildarea/trunk.klose-debian-ia64/build/Lib/test/test_multiprocessing.py", line 1075, in <genexpr>
    while countdown and not all(w.is_alive() for w in p._pool):
  File "/home/pybot/buildarea/trunk.klose-debian-ia64/build/Lib/multiprocessing/process.py", line 132, in is_alive
    self._popen.poll()
  File "/home/pybot/buildarea/trunk.klose-debian-ia64/build/Lib/multiprocessing/forking.py", line 106, in poll
    pid, sts = os.waitpid(self.pid, flag)
OSError: [Errno 10] No child processes
msg100429 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-04 22:53
It looks like a new incarnation of long standing #1731717.
msg100644 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-08 14:20
This last bug is fixed, too.
http://bugs.python.org/issue1731717#msg100643
History
Date User Action Args
2011-04-05 00:06:29vstinnersetnosy: + vstinner
2010-03-08 14:20:55floxsetstatus: pending -> closed

messages: + msg100644
2010-03-05 08:20:49floxsetstatus: open -> pending
2010-03-05 08:20:41floxsetfiles: - issue7805_process_is_alive_py3.diff
2010-03-04 22:53:30floxsetmessages: + msg100429
2010-03-04 22:42:36floxsetstatus: pending -> open

messages: + msg100428
2010-03-04 16:13:18floxsetstatus: open -> pending
resolution: accepted -> fixed
messages: + msg100395

stage: patch review -> resolved
2010-03-04 08:54:14floxsetassignee: jnoller -> flox
resolution: accepted
2010-03-04 08:48:56floxsetfiles: + issue7805_process_is_alive_py3.diff

messages: + msg100381
stage: needs patch -> patch review
2010-02-20 10:43:27skrahsetfiles: + issue7805_problem_outline.patch
keywords: + patch
messages: + msg99614
2010-02-15 11:38:47floxsetkeywords: + buildbot
2010-02-12 16:08:39pitrousetmessages: + msg99271
2010-02-12 16:04:53skrahsetnosy: + skrah
messages: + msg99270
2010-02-01 15:50:50pitrousetversions: + Python 2.6, Python 2.7
2010-02-01 15:50:38pitrousetassignee: jnoller
messages: + msg98667
components: + Tests
versions: + Python 3.1, - Python 2.7
2010-01-29 14:07:34floxsetnosy: + flox

messages: + msg98510
versions: + Python 2.7
2010-01-29 14:02:37pitroucreate