classification
Title: test_wait4 error on AIX
Type: behavior Stage: resolved
Components: Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: neologix Nosy List: David.Edelsohn, neologix, pitrou, python-dev, sable
Priority: normal Keywords:

Created on 2011-02-11 11:39 by sable, last changed 2013-07-04 19:30 by pitrou. This issue is now closed.

Messages (10)
msg128375 - (view) Author: Sébastien Sablé (sable) Date: 2011-02-11 11:39
I get an error when running test_wait4 with trunk on AIX:

test_wait (__main__.Wait4Test) ... FAIL

======================================================================
FAIL: test_wait (__main__.Wait4Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/san_cis/home/cis/.buildbot/python-aix6/3.x.phenix.xlc/build/Lib/test/fork_wait.py", line 72, in test_wait
    self.wait_impl(cpid)
  File "./Lib/test/test_wait4.py", line 23, in wait_impl
    self.assertEqual(spid, cpid)
AssertionError: 0 != 1486954

----------------------------------------------------------------------
Ran 1 test in 12.030s

FAILED (failures=1)
Traceback (most recent call last):
  File "./Lib/test/test_wait4.py", line 32, in <module>
    test_main()
  File "./Lib/test/test_wait4.py", line 28, in test_main
    run_unittest(Wait4Test)
  File "/san_cis/home/cis/.buildbot/python-aix6/3.x.phenix.xlc/build/Lib/test/support.py", line 1145, in run_unittest
    _run_suite(suite)
  File "/san_cis/home/cis/.buildbot/python-aix6/3.x.phenix.xlc/build/Lib/test/support.py", line 1128, in _run_suite
    raise TestFailed(err)
test.support.TestFailed: Traceback (most recent call last):
  File "/san_cis/home/cis/.buildbot/python-aix6/3.x.phenix.xlc/build/Lib/test/fork_wait.py", line 72, in test_wait
    self.wait_impl(cpid)
  File "./Lib/test/test_wait4.py", line 23, in wait_impl
    self.assertEqual(spid, cpid)
AssertionError: 0 != 1486954

Thanks in advance
msg128727 - (view) Author: Sébastien Sablé (sable) Date: 2011-02-17 15:19
This issue already existed on Python 2.5.2 with AIX 5.2:

http://www.mail-archive.com/python-list@python.org/msg192219.html

The documentation for WNOHANG says:
http://docs.python.org/library/os.html#os.WNOHANG
"""
The option for waitpid() to return immediately if no child process status is available immediately. The function returns (0, 0) in this case.
"""

It seems wait4 always returns 0 on AIX when WNOHANG is specified.

Removing WNOHANG will make the test succeed.

waitpid does not have the same limitation.

I suppose this is a bug of AIX, though there is not even a man page to describe wait4 on this platform.

Here is a proposition for a patch that will workaround this bug...

Index: Lib/test/test_wait4.py
===================================================================
--- Lib/test/test_wait4.py      (revision 88430)
+++ Lib/test/test_wait4.py      (working copy)
@@ -3,6 +3,7 @@
 
 import os
 import time
+import sys
 from test.fork_wait import ForkWait
 from test.support import run_unittest, reap_children, get_attribute
 
@@ -13,10 +14,14 @@
 
 class Wait4Test(ForkWait):
     def wait_impl(self, cpid):
+        option = os.WNOHANG
+        if sys.platform.startswith('aix'):
+            # wait4 is broken on AIX and will always return 0 with WNOHANG
+            option = 0
         for i in range(10):
             # wait4() shouldn't hang, but some of the buildbots seem to hang
             # in the forking tests.  This is an attempt to fix the problem.
-            spid, status, rusage = os.wait4(cpid, os.WNOHANG)
+            spid, status, rusage = os.wait4(cpid, option)
             if spid == cpid:
                 break
             time.sleep(1.0)
msg130164 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-03-06 10:57
If test_wait3 and test_fork1 pass, then yes, it's probably an issue with AIX's wait4.
See http://fixunix.com/aix/84872-sigchld-recursion.html:

"""
Replace the wait4() call with a waitpid() call...
....like this:
for(n=0;waitpid(-1, &status, WNOHANG) > 0; n++) ;

Or, compile the existing code with the BSD library:
cc -o demo demo.c -D_BSD -lbsd

Both will work...

The current problem is that child process is not "seen" by the wait4()
call,
so that when "signal" is rearmed, it immediately goes (recursively)
into the
child_handler() function.
"""

So it seems that under AIX, posix_wait4 should be compiled with -D_BSD -lbsd.
Could you try this ?

If this doesn't do the trick, then avoiding passing WNOHANG could be the second option.
msg130247 - (view) Author: Sébastien Sablé (sable) Date: 2011-03-07 10:15
I had seen that post you mentioned and already tested the -lbsd without success.

wait4 is not even present in libbsd.

phenix:~$ nm /usr/lib/libbsd.a  | grep wait
phenix:~$ 

Maybe it was present on older versions of the system. But I couldn't find any documentation mentioning wait4 and -lbsd anywhere.

Actually wait4 is never mentioned in the IBM documentation concerning AIX.

wait4 without WNOHANG works fine. waitpid works fine even with WNOHANG.
I don't know which workaround is the better.
I will also try to report this bug to IBM so that a future version of AIX could work correctly.
msg130291 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-03-07 20:53
> wait4 without WNOHANG works fine. waitpid works fine even with WNOHANG.
> I don't know which workaround is the better.

As far as the test is concerned, it's of course better to use wait4
without WNOHANG in a test names test_wait4 (especially since waitpid
is tested elsewhere)...
msg130308 - (view) Author: Sébastien Sablé (sable) Date: 2011-03-08 09:30
Yes, for the test, as I put in msg128727, it works fine by removing WNOHANG.

However I should put a note in the AIX-NOTES file to explain that wait4 is broken with WNOHANG on AIX and suggest the 2 workarounds.
msg192255 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2013-07-03 20:46
The patch in msg128727 is correct for AIX and should be applied.
msg192305 - (view) Author: Roundup Robot (python-dev) Date: 2013-07-04 19:05
New changeset b3ea1b5a1617 by Antoine Pitrou in branch '3.3':
Issue #11185: Fix test_wait4 under AIX.  Patch by Sébastien Sablé.
http://hg.python.org/cpython/rev/b3ea1b5a1617

New changeset 8055521e372f by Antoine Pitrou in branch 'default':
Issue #11185: Fix test_wait4 under AIX.  Patch by Sébastien Sablé.
http://hg.python.org/cpython/rev/8055521e372f
msg192306 - (view) Author: Roundup Robot (python-dev) Date: 2013-07-04 19:11
New changeset e3fd5fc5dc47 by Antoine Pitrou in branch '2.7':
Issue #11185: Fix test_wait4 under AIX.  Patch by Sébastien Sablé.
http://hg.python.org/cpython/rev/e3fd5fc5dc47
msg192307 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-07-04 19:30
Thank you. This should be fixed now. Please reopen if not.
History
Date User Action Args
2013-07-04 19:30:09pitrousetstatus: open -> closed

versions: + Python 2.7, Python 3.3, Python 3.4, - Python 3.2
nosy: + pitrou

messages: + msg192307
resolution: fixed
stage: resolved
2013-07-04 19:11:19python-devsetmessages: + msg192306
2013-07-04 19:05:42python-devsetnosy: + python-dev
messages: + msg192305
2013-07-04 11:25:43neologixsetassignee: neologix
2013-07-03 20:46:27David.Edelsohnsetmessages: + msg192255
2013-06-19 21:20:44David.Edelsohnsetnosy: + David.Edelsohn
type: behavior
2011-03-08 09:30:37sablesetmessages: + msg130308
2011-03-07 20:53:53neologixsetmessages: + msg130291
2011-03-07 10:15:32sablesetmessages: + msg130247
2011-03-06 10:57:10neologixsetnosy: + neologix
messages: + msg130164
2011-02-17 15:19:37sablesetmessages: + msg128727
2011-02-11 11:40:00sablecreate