msg316771 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-16 06:30 |
https://travis-ci.org/python/cpython/jobs/379560387
======================================================================
ERROR: test_ignore (test.test_multiprocessing_forkserver.TestIgnoreEINTR)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/build/python/cpython/Lib/test/_test_multiprocessing.py", line 4359, in test_ignore
os.kill(p.pid, signal.SIGUSR1)
ProcessLookupError: [Errno 3] No such process
|
msg317985 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2018-05-29 06:19 |
FWIW, Issue29589 seems to be an older example of this failure.
|
msg318008 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-29 11:08 |
> FWIW, Issue29589 seems to be an older example of this failure.
It was a failure on Mageia 5 (Linux) at 2017-02-17:
== CPython 3.6.0 (default, Feb 17 2017, 15:26:31) [GCC 4.9.2]
== Linux-4.4.13-desktop-1.mga5-x86_64-with-mageia-5-Official little-endian
== hash algorithm: siphash24 64bit
== cwd: /home/dima/bin/Python-3.6.0/build/test_python_8244
== encodings: locale=UTF-8, FS=utf-8
Run tests sequentially
0:00:00 [1/1] test_multiprocessing_forkserver
...
======================================================================
ERROR: test_ignore (test.test_multiprocessing_forkserver.TestIgnoreEINTR)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/dima/bin/Python-3.6.0/Lib/test/_test_multiprocessing.py", line 3728, in test_ignore
os.kill(p.pid, signal.SIGUSR1)
ProcessLookupError: [Errno 3] No such process
|
msg318010 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-29 11:28 |
I am unable to reproduce the issue on Fedora 28 (Linux kernel 4.16.11, glibc 2.27):
I ran 6 jobs in parallel during 10 minutes:
* 2: ./python -m test test_multiprocessing_forkserver -m test_ignore -F
* 2: ./python -m test test_multiprocessing_forkserver -F
* 2: ./python -m test -j16 -r -F
The system load was around 30 which is very high (my CPU has 8 logicial threads, 4 physical cores).
|
msg318215 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-30 22:13 |
> https://travis-ci.org/python/cpython/jobs/379560387
More info about this failure:
== Linux-4.4.0-112-generic-x86_64-with-debian-jessie-sid little-endian
== CPU count: 48
Run tests in parallel using 4 child processes
(...)
0:07:06 load avg: 24.76 [356/416/3] test_multiprocessing_forkserver failed -- running: test_concurrent_futures (61 sec)
|
msg318222 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-30 23:22 |
Davin, Antoine: any idea on this bug?
I ran the full test suite and test_multiprocessing_forkserver in parallel in a Trust VM, but I failed to reproduce the bug.
I did a similar test in a Ubuntu Trusty docker container: again, I'm unable to reproduce the bug.
An Ubuntu Trusty docker container is supposed to be as close as possible to Travis CI, except that my laptop doesn't have 48 CPUs.
|
msg318225 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-30 23:37 |
test_ignore() starts to fail more and more often on Travis CI for an unknown reason.
https://travis-ci.org/python/cpython/jobs/385562187
== CPU count: 48
(...)
Run tests in parallel using 4 child processes
(...)
0:07:38 load avg: 128.04 [415/416/1] test_multiprocessing_forkserver failed -- running: test_concurrent_futures (378 sec)
(...)
======================================================================
ERROR: test_ignore (test.test_multiprocessing_forkserver.TestIgnoreEINTR)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/build/python/cpython/Lib/test/_test_multiprocessing.py", line 4359, in test_ignore
os.kill(p.pid, signal.SIGUSR1)
ProcessLookupError: [Errno 3] No such process
|
msg318226 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-30 23:38 |
https://travis-ci.org/python/cpython/jobs/379560387
Re-running test 'test_multiprocessing_forkserver' in verbose mode
(...)
======================================================================
ERROR: test_ignore (test.test_multiprocessing_forkserver.TestIgnoreEINTR)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/build/python/cpython/Lib/test/_test_multiprocessing.py", line 4359, in test_ignore
os.kill(p.pid, signal.SIGUSR1)
ProcessLookupError: [Errno 3] No such process
|
msg318227 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-30 23:40 |
Python 3.7:
https://travis-ci.org/python/cpython/jobs/385474104
Re-running test 'test_multiprocessing_forkserver' in verbose mode
======================================================================
ERROR: test_ignore (test.test_multiprocessing_forkserver.TestIgnoreEINTR)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/build/python/cpython/Lib/test/_test_multiprocessing.py", line 4324, in test_ignore
os.kill(p.pid, signal.SIGUSR1)
ProcessLookupError: [Errno 3] No such process
|
msg318228 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-30 23:40 |
Python 3.7:
https://travis-ci.org/python/cpython/jobs/385458840
Re-running test 'test_multiprocessing_forkserver' in verbose mode
======================================================================
ERROR: test_ignore (test.test_multiprocessing_forkserver.TestIgnoreEINTR)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/build/python/cpython/Lib/test/_test_multiprocessing.py", line 4324, in test_ignore
os.kill(p.pid, signal.SIGUSR1)
ProcessLookupError: [Errno 3] No such process
|
msg318232 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-31 00:44 |
I added debug traces in PR 7260. test_ignore() failed, but it may be related to my debug traces, since the failure is different.
test_multiprocessing_fork.test_ignore() failure:
test_ignore (test.test_multiprocessing_fork.TestIgnoreEINTR) ...
parent: pid=20159
parent: start child
parent: p=<Process(Process-3, started daemon)>
parent: p.ident=20176
ERROR
======================================================================
ERROR: test_ignore (test.test_multiprocessing_fork.TestIgnoreEINTR)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/build/python/cpython/Lib/test/_test_multiprocessing.py", line 4389, in test_ignore
self.assertEqual(conn.recv(), 'ready')
File "/home/travis/build/python/cpython/Lib/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/travis/build/python/cpython/Lib/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/travis/build/python/cpython/Lib/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
test_multiprocessing_fork.test_ignore() success:
test_ignore (test.test_multiprocessing_fork.TestIgnoreEINTR) ...
parent: pid=11713
parent: start child
parent: p=<Process(Process-3, started daemon)>
parent: p.ident=28998
[INFO/Process-3] child process calling self.run()
child: pid=28998
child: register SIGUSR1 handler, send ready
child: wait 1234
parent: ready received
parent: first SIGUSR1
child: got SIGUSR1
parent: send 1234
parent: wait 1234
child: send back 1234
child: big blocking send
parent: second SIGUSR1
parent: wait bytes
child: got SIGUSR1
child: done
[INFO/Process-3] process shutting down
[INFO/Process-3] process shutting down
[INFO/Process-3] process exiting with exitcode 0
[INFO/Process-3] process exiting with exitcode 0
parent: join child
parent: done
ok
|
msg318233 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-31 00:45 |
> I added debug traces in PR 7260. test_ignore() failed, but it may be related to my debug traces, since the failure is different.
https://travis-ci.org/python/cpython/jobs/385958924
|
msg318234 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-31 02:04 |
Notes:
* test_ignore() creates a daemonic process (it cannot spawn new processes)
* the fork server process is shared between test methods of test_forkserver
* multiprocessing.util._exit_function() terminates all daemon processes
|
msg318236 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-31 02:36 |
test_multiprocessing_forkserver.test_ignore() failed on Travis CI using PR 7261: --randseed=7474929. The method failed once when run in parallel, and then failed again when re-reun in verbose mode.
|
msg318237 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-31 02:40 |
> test_multiprocessing_forkserver.test_ignore() failed on Travis CI using PR 7261: --randseed=7474929. The method failed once when run in parallel, and then failed again when re-reun in verbose mode.
https://travis-ci.org/python/cpython/jobs/385986803
0:04:57 load avg: 25.81 [342/415/1] test_multiprocessing_forkserver failed -- running: test_concurrent_futures (72 sec)
|
msg318238 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-31 02:48 |
I added more traces to PR 7261 and the bug still occurred on Travis CI.
https://travis-ci.org/python/cpython/jobs/385990848
0:05:04 load avg: 42.62 [342/415/1] test_multiprocessing_forkserver failed -- running: test_concurrent_futures (69 sec)
child: pid=19401 ppid=19348
child: send BIG
child: done
test_answer_challenge_auth_failure (test.test_multiprocessing_forkserver.OtherTest) ... ok
test_deliver_challenge_auth_failure (test.test_multiprocessing_forkserver.OtherTest) ... ok
test_closefd (test.test_multiprocessing_forkserver.TestCloseFds) ... ok
test_flags (test.test_multiprocessing_forkserver.TestFlags) ... ok
test_lock (test.test_multiprocessing_forkserver.TestForkAwareThreadLock) ... ok
test_ignore (test.test_multiprocessing_forkserver.TestIgnoreEINTR) ... parent: pid=19342
parent: second SIGUSR1
ERROR
Hum... maybe the child exited, before the parent sent SIGUSR1: the child didn't block on sending 1 MB?
|
msg318239 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-31 03:08 |
New changeset 5d6c7ed5e340b2311a15f34e968d4bef09c71922 by Victor Stinner in branch 'master':
bpo-33532: Fix multiprocessing test_ignore() (#7262)
https://github.com/python/cpython/commit/5d6c7ed5e340b2311a15f34e968d4bef09c71922
|
msg318240 - (view) |
Author: miss-islington (miss-islington) |
Date: 2018-05-31 03:28 |
New changeset 9d273344238330d634acb4a7d427a8fb732ae62e by Miss Islington (bot) in branch '3.7':
bpo-33532: Fix multiprocessing test_ignore() (GH-7262)
https://github.com/python/cpython/commit/9d273344238330d634acb4a7d427a8fb732ae62e
|
msg318254 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-31 05:35 |
New changeset c2870b699eb899f358b843c2d706ce8ca89b2bf4 by Victor Stinner in branch '3.6':
bpo-33532: Fix multiprocessing test_ignore() (GH-7262) (#7266)
https://github.com/python/cpython/commit/c2870b699eb899f358b843c2d706ce8ca89b2bf4
|
msg318255 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-31 05:35 |
New changeset 82c456fa37d4b87da0d64e574bd8adc665abc4c8 by Victor Stinner in branch '2.7':
bpo-33532: Fix multiprocessing test_ignore() (GH-7265)
https://github.com/python/cpython/commit/82c456fa37d4b87da0d64e574bd8adc665abc4c8
|
msg318265 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2018-05-31 08:07 |
Did your PR fix the issue?
|
msg318301 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-05-31 13:08 |
> Did your PR fix the issue?
The bug was that *sometimes* on Travis CI, and only on Travis CI (!?), writing 1 MiB into the multiprocessing pipe didn't block. The bug is really strange because it is only reproduced on the clang Linux job of Travis CI which runs tests in parallel. Not on the Linux gcc which runs tests sequentially in coverage. Moreover, the failure only occurs for a specific order of tests.
You can easily reproduce the issue if you reduce the size of the data written into the pipe at the end of _test_ignore(). If the write (send_bytes) doesn't block, you get the same error.
I'm confident that writing 4 MiB instead of 1 MiB will fix the issue. I saw the test passing with 4 MiB whereas it failed with 1 MiB, when I fixed the test order.
|
msg318409 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-06-01 13:05 |
I saw one new test_ignore() failure on Travis CI in my 3.6 PR, whereas 3.6 already uses PIPE_MAX_SIZE for test_ignore().
https://github.com/python/cpython/pull/7315/
https://travis-ci.org/python/cpython/jobs/386637729
Hum, test_ignore() uses PIPE_MAX_SIZE, whereas the test fails on Linux where we use a pair of sockets, no pipes. Maybe we should use SOCK_MAX_SIZE?
Notes on pipe size:
* /proc/sys/fs/pipe-max-size
* bpo-17835: add support.PIPE_MAX_SIZE (4 MiB +1 B)
* bpo-18643: add support.SOCK_MAX_SIZE (16 MiB)
|
msg318416 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-06-01 13:34 |
> https://travis-ci.org/python/cpython/jobs/386637729
I added some debug traces:
test_ignore (test.test_multiprocessing_forkserver.TestIgnoreEINTR) ...
socket[fd=4].SO_RCVBUF 212992
socket[fd=4].SO_SNDBUF 212992
socket[fd=7].SO_RCVBUF 212992
socket[fd=7].SO_SNDBUF 212992
test_ignore: PIPE_MAX_SIZE = 4194305
The socket pair uses a buffer of 208 KiB (212,992 B) in both directions and the test sends 4 MiB (4,194,305 B) for a blocking call... and sometimes the send doesn't block.
|
msg318420 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-06-01 14:48 |
New changeset 252f6abe0a9430f4ae7588c0cb50a6ff141bebe3 by Victor Stinner in branch 'master':
bpo-33532: Fix test_multiprocessing_forkserver.test_ignore() (GH-7319)
https://github.com/python/cpython/commit/252f6abe0a9430f4ae7588c0cb50a6ff141bebe3
|
msg318421 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-06-01 14:50 |
I modified test_ignore() to use support.SOCK_MAX_SIZE, but honestly, I'm not convinced that it will be the issue. I applied my change anyway, just to check if the issue comes from the size, or if it's something else.
|
msg318426 - (view) |
Author: miss-islington (miss-islington) |
Date: 2018-06-01 15:21 |
New changeset 63fa8db58c349f985d75139e6545cfddf067c07b by Miss Islington (bot) in branch '3.7':
bpo-33532: Fix test_multiprocessing_forkserver.test_ignore() (GH-7319)
https://github.com/python/cpython/commit/63fa8db58c349f985d75139e6545cfddf067c07b
|
msg318438 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-06-01 17:39 |
New changeset 1d4be0a65f66c2eecefb8b8bd4d115a82fa61067 by Victor Stinner in branch '2.7':
bpo-33532: Fix test_multiprocessing_forkserver.test_ignore() (GH-7323)
https://github.com/python/cpython/commit/1d4be0a65f66c2eecefb8b8bd4d115a82fa61067
|
msg318441 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-06-01 17:39 |
New changeset 64e538bc703e423a04ab435c4eab6b950b8aef7e by Victor Stinner in branch '3.6':
bpo-33532: Fix test_multiprocessing_forkserver.test_ignore() (GH-7322)
https://github.com/python/cpython/commit/64e538bc703e423a04ab435c4eab6b950b8aef7e
|
msg320051 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-06-20 10:12 |
It didn't see this super annoying failure recently, so it seems like it has been fixed for real. Great!
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:00 | admin | set | github: 77713 |
2018-06-20 10:12:32 | vstinner | set | status: open -> closed resolution: fixed messages:
+ msg320051
stage: patch review -> resolved |
2018-06-01 17:39:58 | vstinner | set | messages:
+ msg318441 |
2018-06-01 17:39:19 | vstinner | set | messages:
+ msg318438 |
2018-06-01 15:38:24 | vstinner | set | pull_requests:
+ pull_request6952 |
2018-06-01 15:36:49 | vstinner | set | pull_requests:
+ pull_request6951 |
2018-06-01 15:21:25 | miss-islington | set | messages:
+ msg318426 |
2018-06-01 14:50:49 | vstinner | set | messages:
+ msg318421 |
2018-06-01 14:48:50 | miss-islington | set | pull_requests:
+ pull_request6949 |
2018-06-01 14:48:36 | vstinner | set | messages:
+ msg318420 |
2018-06-01 14:05:05 | vstinner | set | stage: resolved -> patch review pull_requests:
+ pull_request6948 |
2018-06-01 13:34:45 | vstinner | set | messages:
+ msg318416 |
2018-06-01 13:05:41 | vstinner | set | status: closed -> open resolution: fixed -> (no value) messages:
+ msg318409
|
2018-06-01 12:58:43 | vstinner | set | pull_requests:
+ pull_request6947 |
2018-05-31 13:08:32 | vstinner | set | status: open -> closed stage: patch review -> resolved resolution: fixed versions:
+ Python 2.7, Python 3.6 |
2018-05-31 13:08:19 | vstinner | set | messages:
+ msg318301 |
2018-05-31 08:07:28 | pitrou | set | messages:
+ msg318265 |
2018-05-31 05:35:36 | vstinner | set | messages:
+ msg318255 |
2018-05-31 05:35:05 | vstinner | set | messages:
+ msg318254 |
2018-05-31 03:28:08 | miss-islington | set | nosy:
+ miss-islington messages:
+ msg318240
|
2018-05-31 03:19:00 | vstinner | set | pull_requests:
+ pull_request6894 |
2018-05-31 03:15:43 | vstinner | set | pull_requests:
+ pull_request6893 |
2018-05-31 03:09:59 | miss-islington | set | pull_requests:
+ pull_request6892 |
2018-05-31 03:08:45 | vstinner | set | messages:
+ msg318239 |
2018-05-31 02:53:38 | vstinner | set | pull_requests:
+ pull_request6889 |
2018-05-31 02:48:30 | vstinner | set | messages:
+ msg318238 |
2018-05-31 02:40:59 | vstinner | set | messages:
+ msg318237 |
2018-05-31 02:36:03 | vstinner | set | messages:
+ msg318236 |
2018-05-31 02:06:19 | vstinner | set | pull_requests:
+ pull_request6888 |
2018-05-31 02:04:03 | vstinner | set | messages:
+ msg318234 |
2018-05-31 00:45:43 | vstinner | set | messages:
+ msg318233 |
2018-05-31 00:44:32 | vstinner | set | messages:
+ msg318232 |
2018-05-31 00:17:22 | vstinner | set | pull_requests:
+ pull_request6887 |
2018-05-30 23:40:54 | vstinner | set | messages:
+ msg318228 |
2018-05-30 23:40:17 | vstinner | set | messages:
+ msg318227 |
2018-05-30 23:38:17 | vstinner | set | messages:
+ msg318226 |
2018-05-30 23:37:14 | vstinner | set | messages:
+ msg318225 |
2018-05-30 23:22:33 | vstinner | set | nosy:
+ pitrou, davin messages:
+ msg318222
|
2018-05-30 22:13:58 | vstinner | set | messages:
+ msg318215 |
2018-05-30 15:14:52 | vstinner | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request6873 |
2018-05-29 11:28:19 | vstinner | set | messages:
+ msg318010 |
2018-05-29 11:08:53 | vstinner | set | messages:
+ msg318008 |
2018-05-29 06:19:01 | ned.deily | set | nosy:
+ ned.deily
messages:
+ msg317985 versions:
+ Python 3.7 |
2018-05-16 06:30:24 | vstinner | create | |