This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_multiprocessing_spawn crashes under PowerLinux
Type: crash Stage: resolved
Components: Library (Lib), Tests Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: sbt Nosy List: David.Edelsohn, neologix, pitrou, python-dev, sbt
Priority: critical Keywords: patch

Created on 2013-08-19 21:33 by pitrou, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
eintr_alarm.diff neologix, 2013-08-28 13:54
Messages (11)
msg195673 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-19 21:33
http://buildbot.python.org/all/builders/PPC64%20PowerLinux%203.x

[319/379] test_multiprocessing_spawn
/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/multiprocessing/semaphore_tracker.py:121: UserWarning: semaphore_tracker: There appear to be 2 leaked semaphores to clean up at shutdown
  len(cache))
/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/multiprocessing/semaphore_tracker.py:133: UserWarning: semaphore_tracker: '/mp-t89tlie_': [Errno 2] No such file or directory
  warnings.warn('semaphore_tracker: %r: %s' % (name, e))
make: *** [buildbottest] User defined signal 1
Process PoolWorker-777:
Traceback (most recent call last):
  File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/multiprocessing/pool.py", line 123, in worker
    put((job, i, result))
  File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/multiprocessing/connection.py", line 202, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/multiprocessing/connection.py", line 401, in _send_bytes
    self._send(struct.pack("!i", n))
  File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/multiprocessing/connection.py", line 371, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/multiprocessing/process.py", line 255, in _bootstrap
    self.run()
  File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/multiprocessing/process.py", line 92, in run
    self._target(*self._args, **self._kwargs)
  File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/multiprocessing/pool.py", line 128, in worker
    put((job, i, (False, wrapped)))
  File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/multiprocessing/connection.py", line 202, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/multiprocessing/connection.py", line 401, in _send_bytes
    self._send(struct.pack("!i", n))
  File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/multiprocessing/connection.py", line 371, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/multiprocessing/semaphore_tracker.py:121: UserWarning: semaphore_tracker: There appear to be 4 leaked semaphores to clean up at shutdown
  len(cache))
program finished with exit code 2
elapsedTime=4624.019498
msg195696 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2013-08-20 15:16
I am not certain what is going on. Only 3.x appears to be affected, but the problems seem somewhat intermittent. There were some strange processes of another user running on the buildslave, which was driving the load up very high. I have killed the processes and blocked the user. We can see if that affects the test results.
msg195703 - (view) Author: David Edelsohn (David.Edelsohn) * Date: 2013-08-20 17:10
The crash seems to have been due to another user abusing the buildslave system.

The remaining failure is a mis-match in the expected GDB output.

AssertionError: "{<object at remote 0x3fffb176a040>, 'b'}" != "{'b'}"
- {<object at remote 0x3fffb176a040>, 'b'}
+ {'b'}
msg195706 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-20 18:03
Thanks David! The test_gdb failure is another issue (not PowerLinux-specific), see issue18772.
msg196374 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-08-28 11:01
The PPC64 buildbot is still failing intermittently.
msg196375 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-08-28 11:38
It looks like the main process keeps getting killed by SIGUSR1.  Don't know why.
msg196377 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-28 11:54
> It looks like the main process keeps getting killed by SIGUSR1.
> Don't know why.

In Lib/test/_test_multiprocessing.py:
"""
    def test_poll_eintr(self):
        got_signal = [False]
        def record(*args):
            got_signal[0] = True
        pid = os.getpid()
        oldhandler = signal.signal(signal.SIGUSR1, record)
        try:
            killer = self.Process(target=self._killer, args=(pid,))
            killer.start()
            p = self.Process(target=time.sleep, args=(1,))
            p.start()
            p.join()
            self.assertTrue(got_signal[0])
            self.assertEqual(p.exitcode, 0)
            killer.join()
        finally:
            signal.signal(signal.SIGUSR1, oldhandler)
"""

If the _killer process takes too long to start, it won't send SIGUSR1 before the p process returns (0.5s vs 1s): which means that the default SIGUSR1 handler will be restored before SIGUSR1 is sent. Then SIGUSR1 comes in, resulting on the failure above.
msg196380 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-08-28 12:57
New changeset fa23e49c7dd3 by Richard Oudkerk in branch 'default':
Issue #18786: Don't reinstall old SIGUSR1 handler prematurely.
http://hg.python.org/cpython/rev/fa23e49c7dd3
msg196381 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-08-28 12:58
> If the _killer process takes too long to start, it won't send SIGUSR1 
> before the p process returns...

Thanks!
msg196383 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-28 13:54
> Thanks!

You're welcome :)

BTW, I don't know if that would fulfill the goal of your test here,
but when I want to check for EINTR handling, I just use alarm (see
attached patch). The only downside is that the minimum delay is 1
second.
msg196386 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-08-28 14:28
It should be fixed now so I will close.
History
Date User Action Args
2022-04-11 14:57:49adminsetgithub: 62986
2013-08-28 14:28:47sbtsetstatus: open -> closed
resolution: fixed
messages: + msg196386

stage: resolved
2013-08-28 13:54:38neologixsetfiles: + eintr_alarm.diff
keywords: + patch
messages: + msg196383
2013-08-28 12:58:29sbtsetmessages: + msg196381
2013-08-28 12:57:29python-devsetnosy: + python-dev
messages: + msg196380
2013-08-28 11:54:58neologixsetnosy: + neologix
messages: + msg196377
2013-08-28 11:38:46sbtsetmessages: + msg196375
2013-08-28 11:01:05sbtsetstatus: closed -> open
resolution: not a bug -> (no value)
messages: + msg196374
2013-08-20 18:03:36pitrousetstatus: open -> closed
resolution: not a bug
messages: + msg195706
2013-08-20 17:10:18David.Edelsohnsetmessages: + msg195703
2013-08-20 15:16:41David.Edelsohnsetmessages: + msg195696
2013-08-19 21:33:37pitroucreate