New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEP 475 - EINTR handling #67474
Comments
The test runs fine on Linux, but hangs in test_send() on OS-X and *BSD. |
Perhaps Ned can help on the OS X side of things. |
The review diff is weird: it seems it contains changes that aren't EINTR-related (see e.g. argparse.rst). |
(It may be several days before I can spend much time on it but I will take a look.) |
Here's a manually generated diff. |
I added a few prints to the send and receive loops of _test_send. When running on a reasonably current Debian testing Linux: ./python Lib/test/eintrdata/eintr_tester.py ---------------------------------------------------------------------- OK When run on OS X (10.10.1): test_read (main.OSEINTRTest) ... ok When run standalone, the tests do eventually finish without error but take a *long* time to do so. |
Thanks, that's what I was suspecting, but I really don't understand Could you try by increasing signal_period to e.g. 0.5, and sleep_time to 1? |
It turns out the times are not important; the hangup is the default size of the socket buffers on OS X and possibly BSD in general. In my case, the send and receive buffers are 8192, which explains why the chunks written are so small. I somewhat arbitrarily changed the sizes of the buffers in _test_send with: rd.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, support.SOCK_MAX_SIZE // 3)
wr.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, support.SOCK_MAX_SIZE // 3) The results were: test_send (main.SocketEINTRTest) ... rd SO_RCVBUF default was 8192, wr SO_SNDBUF default was 8192 I dunno if a value that large will work in all environments, so the code to call setsockopt might need to be smarter. |
Hmmm... I'll try to increase the socket buffer size and see what the buildbots |
OK, actually the receiver is completely CPU-bound, because of memory |
OK, it turns out the culprit was repeated calls to BytesIO.getvalue(), |
eintr-1.diff doesn't seem to make any significant difference from eintr.diff on my system. It's still pegging a CPU at 100% and takes 7 minutes wall time to complete. $ time ./python ~/Projects/PyDev/active/dev/3x/source/Lib/test/eintrdata/eintr_tester.py
test_read (__main__.OSEINTRTest) ... ok
test_wait (__main__.OSEINTRTest) ... ok
test_wait3 (__main__.OSEINTRTest) ... ok
test_wait4 (__main__.OSEINTRTest) ... ok
test_waitpid (__main__.OSEINTRTest) ... ok
test_write (__main__.OSEINTRTest) ... ok
test_accept (__main__.SocketEINTRTest) ... ok
test_recv (__main__.SocketEINTRTest) ... ok
test_recvmsg (__main__.SocketEINTRTest) ... ok
test_send (__main__.SocketEINTRTest) ... ok
test_sendall (__main__.SocketEINTRTest) ... ok
test_sendmsg (__main__.SocketEINTRTest) ... ok Ran 12 tests in 439.966s OK real 7m20.276s |
Alright, enough played: the patch attached uses a memoryview and |
With eintr-2.diff, fast!: $ time ./python ~/Projects/PyDev/active/dev/3x/source/Lib/test/eintrdata/eintr_tester.py
test_read (__main__.OSEINTRTest) ... ok
test_wait (__main__.OSEINTRTest) ... ok
test_wait3 (__main__.OSEINTRTest) ... ok
test_wait4 (__main__.OSEINTRTest) ... ok
test_waitpid (__main__.OSEINTRTest) ... ok
test_write (__main__.OSEINTRTest) ... ok
test_accept (__main__.SocketEINTRTest) ... ok
test_recv (__main__.SocketEINTRTest) ... ok
test_recvmsg (__main__.SocketEINTRTest) ... ok
test_send (__main__.SocketEINTRTest) ... ok
test_sendall (__main__.SocketEINTRTest) ... ok
test_sendmsg (__main__.SocketEINTRTest) ... ok Ran 12 tests in 7.652s OK real 0m7.959s Instrumented test_send, 3 socket.send calls, many socket.recv_into calls: test_send (main.SocketEINTRTest) ... rd SO_RCVBUF default was 8192, wr SO_SNDBUF default was 8192 sent = 8192, total written = 8192 received = 8192, total read = 8192 |
Victory \°/.
Yep, that's expected. Antoine, I'm now happy with the patch, so we'll be waiting for your |
Victor, do you think there's anything left to do? |
I just realized I didn't retry upon EINTR for open(): eintr-4.diff Also, I added comments explaining why we don't retry upon close() (see |
Charles-François Natali added the comment:
I didn't read these articles yet, but I will. IMO the PEP must be |
PEP is now updated. |
Would it be possible to push the latest patch right now, and fix remaining issues (if there are known issues?), before Python 3.4 alpha 1? According to the PEP-478, the alpha 1 is scheduled for this sunday (February 8, 2015). |
It's ok for me. Please watch the buildbots :) |
Cool, I'll push on Friday evening or Saturday. |
New changeset 5b63010be19e by Charles-François Natali in branch 'default': |
New changeset 000bbdf0ea76 by Ned Deily in branch 'default': |
The change on Modules/_io/fileio.c is wrong: functions may return None with an exception set. It is wrong because a function must return a result with no exception set, or NULL and an exception set. Attached patch fixes this issue. |
Note: I found the bug while working on a patch for bpo-22181. My test is this shell script: $ while true; do ./python -c 'import os, signal; signal.setitimer(signal.ITIMER_REAL, 0.001, 0.0001); signal.signal(signal.SIGALRM, lambda *args: print(".", end="")); print("urandom"); x=os.urandom(5000); print("ok"); signal.alarm(0)'; if [ $? -ne 142 -a $? -ne 0 ]; then break; fi done The test calls print() in a signal handler which can likely create a reentrant call, which is forbidden. But Python handles this case, it's fine. After removing all debug prints and reverting the fix on fileio.c, the test doesn't crash with the assertion error anymore. Before, an assertion failed because fileio_write() returned Py_None with an exception set. |
@victor: please commit. |
There are already some tests on EINTR in test_file_eintr and test_subprocess (test_communicate_eintr()). |
New changeset cad6eac598ec by Victor Stinner in branch 'default': |
For the record, it seems test_eintr sometimes left zombie processes in my machine where I run reference leak tests every night. I didn't investigate and just disabled the tests. |
(I'll add that that machine is hosted on an OpenVZ-based VPS, so perhaps there are issues with the old patched kernel and whatnot?) |
On my Fedora 21 (on a physical PC, not virtualized), I ran "./python -m test -R 3:3: test_eintr" 3 times. After that, I didn't see any zombi Python process. If I cannot reproduce the issue, I cannot fix it. I bet that it's related to OpenVZ. IMO this issue can be closed. It already has a long history. I prefer to open new issues. See the issue bpo-23648 "PEP-475 meta issue" which lists all issues related to the PEP-475. I opened new issues for each module which didn't handle completly the PEP-475 yet. By the way: great job Charles-François! I like your changeset, it works well. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: