classification
Title: test multiprocessing: test_rapid_restart() crash on AIX when using XLC compiler
Type: Stage: resolved
Components: Tests Versions: Python 3.8
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: Michael.Felt, vstinner
Priority: normal Keywords:

Created on 2019-04-29 13:07 by vstinner, last changed 2019-10-21 11:33 by vstinner. This issue is now closed.

Messages (4)
msg341076 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-29 13:07
POWER6 AIX 3.x:
https://buildbot.python.org/all/#/builders/161/builds/1050

======================================================================
ERROR: test_rapid_restart (test.test_multiprocessing_forkserver.WithManagerTestManagerRestart)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/_test_multiprocessing.py", line 2872, in test_rapid_restart
    queue = manager.get_queue()
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/managers.py", line 737, in temp
    token, exp = self._create(typeid, *args, **kwds)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/managers.py", line 620, in _create
    conn = self._Client(self._address, authkey=self._authkey)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 796, in XmlClient
    return ConnectionWrapper(Client(*args, **kwds), _xml_dumps, _xml_loads)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 502, in Client
    c = SocketClient(address)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 629, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 79] Connection refused

======================================================================
ERROR: test_remote (test.test_multiprocessing_forkserver.WithManagerTestRemoteManager)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/_test_multiprocessing.py", line 2835, in test_remote
    manager2.connect()
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/managers.py", line 545, in connect
    conn = Client(self._address, authkey=self._authkey)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 796, in XmlClient
    return ConnectionWrapper(Client(*args, **kwds), _xml_dumps, _xml_loads)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 508, in Client
    answer_challenge(c, authkey)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 751, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 73] Connection reset by peer

======================================================================
ERROR: test_rapid_restart (test.test_multiprocessing_forkserver.WithProcessesTestManagerRestart)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/_test_multiprocessing.py", line 2872, in test_rapid_restart
    queue = manager.get_queue()
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/managers.py", line 737, in temp
    token, exp = self._create(typeid, *args, **kwds)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/managers.py", line 620, in _create
    conn = self._Client(self._address, authkey=self._authkey)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 796, in XmlClient
    return ConnectionWrapper(Client(*args, **kwds), _xml_dumps, _xml_loads)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 502, in Client
    c = SocketClient(address)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 629, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 79] Connection refused

======================================================================
ERROR: test_rapid_restart (test.test_multiprocessing_forkserver.WithThreadsTestManagerRestart)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/_test_multiprocessing.py", line 2872, in test_rapid_restart
    queue = manager.get_queue()
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/managers.py", line 737, in temp
    token, exp = self._create(typeid, *args, **kwds)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/managers.py", line 620, in _create
    conn = self._Client(self._address, authkey=self._authkey)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 796, in XmlClient
    return ConnectionWrapper(Client(*args, **kwds), _xml_dumps, _xml_loads)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 502, in Client
    c = SocketClient(address)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 629, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 79] Connection refused

----------------------------------------------------------------------

Ran 345 tests in 268.109s

FAILED (errors=4, skipped=29)
Warning -- files was modified by test_multiprocessing_forkserver
  Before: []
  After:  ['core'] 
test test_multiprocessing_forkserver failed


======================================================================
ERROR: test_rapid_restart (test.test_multiprocessing_spawn.WithManagerTestManagerRestart)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/_test_multiprocessing.py", line 2872, in test_rapid_restart
    queue = manager.get_queue()
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/managers.py", line 737, in temp
    token, exp = self._create(typeid, *args, **kwds)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/managers.py", line 620, in _create
    conn = self._Client(self._address, authkey=self._authkey)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 796, in XmlClient
    return ConnectionWrapper(Client(*args, **kwds), _xml_dumps, _xml_loads)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 502, in Client
    c = SocketClient(address)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 629, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 79] Connection refused

======================================================================
ERROR: test_remote (test.test_multiprocessing_spawn.WithManagerTestRemoteManager)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/_test_multiprocessing.py", line 2835, in test_remote
    manager2.connect()
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/managers.py", line 545, in connect
    conn = Client(self._address, authkey=self._authkey)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 796, in XmlClient
    return ConnectionWrapper(Client(*args, **kwds), _xml_dumps, _xml_loads)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 508, in Client
    answer_challenge(c, authkey)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 751, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 73] Connection reset by peer

======================================================================
ERROR: test_rapid_restart (test.test_multiprocessing_spawn.WithProcessesTestManagerRestart)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/_test_multiprocessing.py", line 2872, in test_rapid_restart
    queue = manager.get_queue()
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/managers.py", line 737, in temp
    token, exp = self._create(typeid, *args, **kwds)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/managers.py", line 620, in _create
    conn = self._Client(self._address, authkey=self._authkey)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 796, in XmlClient
    return ConnectionWrapper(Client(*args, **kwds), _xml_dumps, _xml_loads)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 502, in Client
    c = SocketClient(address)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 629, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 79] Connection refused

======================================================================
ERROR: test_rapid_restart (test.test_multiprocessing_spawn.WithThreadsTestManagerRestart)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/test/_test_multiprocessing.py", line 2872, in test_rapid_restart
    queue = manager.get_queue()
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/managers.py", line 737, in temp
    token, exp = self._create(typeid, *args, **kwds)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/managers.py", line 620, in _create
    conn = self._Client(self._address, authkey=self._authkey)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 796, in XmlClient
    return ConnectionWrapper(Client(*args, **kwds), _xml_dumps, _xml_loads)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 502, in Client
    c = SocketClient(address)
  File "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build/Lib/multiprocessing/connection.py", line 629, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 79] Connection refused

----------------------------------------------------------------------

Ran 345 tests in 632.619s

FAILED (errors=4, skipped=32)
Warning -- files was modified by test_multiprocessing_spawn
  Before: []
  After:  ['core'] 
test test_multiprocessing_spawn failed
msg343011 - (view) Author: Michael Felt (Michael.Felt) * Date: 2019-05-21 11:12
I believe (or hope) this is related to issue35828.

This is, as far as I can tell, a compiler issue.

It appears "always" in the bot situation (not building as root) when using xlc-v11, but not when using gcc-4.7.4.

So, when the test failure "disappears" on the bot - it is because I have switched CC=clr_r to CC=gcc

I am quite willing to continue searching (I just removed over three GBytes of core dumps I had collected previously).

As to analysis: it appears the "server" side core-dumps, and the the client-side is refused a connection (obviously).
msg354927 - (view) Author: Michael Felt (Michael.Felt) * Date: 2019-10-18 23:21
Please let me be much more specific.

This specific bot failure is from when I ran the bot using XLC as a compiler. Because I could not solve it on my own, and did not get any hints in time (see issue35828) Since my work schedule intensified I switched the bot to use gcc - and this test failure disappeared.

Considering that the bot no longer uses XLC - it may make better sense to close this one. No AIX bots are using XLC aka vac(pp) compiler.

Should I have time, I'll look into it again as issue35828. Note also, that manually I still build using XLC and the issue rarely occurs. That is part of what makes debugging so difficult.

Michael
msg355057 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-21 11:33
According to Michael Felt, the bug is specific to XLC compiler, but the POWER6 AIX buildbot worker switch to GCC.

Michael Felt: Maybe Python documentation or build system should somewhere discourage the usage of the XLC on AIX because of this bug. But that's a different issue.

Since it seems like nobody is available to debug the XLC specific issue and that the buildbot worker worked around the issue, I close the issue.
History
Date User Action Args
2019-10-21 11:33:21vstinnersetstatus: open -> closed
title: test multiprocessing: test_rapid_restart() crash on AIX -> test multiprocessing: test_rapid_restart() crash on AIX when using XLC compiler
messages: + msg355057

resolution: out of date
stage: resolved
2019-10-18 23:21:40Michael.Feltsetmessages: + msg354927
2019-05-21 11:12:24Michael.Feltsetnosy: + Michael.Felt
messages: + msg343011
2019-04-29 13:07:20vstinnercreate