classification
Title: test_socket.SendfileUsingSendfileTest fails on illumos
Type: Stage: resolved
Components: IO, Library (Lib) Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, jstasiak, kulikjak, miss-islington, wiedi
Priority: normal Keywords: patch

Created on 2020-11-01 20:07 by wiedi, last changed 2020-11-12 12:46 by asvetlov. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 23085 closed wiedi, 2020-11-01 20:21
PR 23154 merged jstasiak, 2020-11-04 21:12
PR 23244 merged miss-islington, 2020-11-12 09:50
PR 23246 merged jstasiak, 2020-11-12 11:09
Messages (8)
msg380160 - (view) Author: Sebastian Wiedenroth (wiedi) * Date: 2020-11-01 20:07
I'm investigating some test failures related to sendfile on illumos:


testCount (test.test_socket.SendfileUsingSendfileTest) ... FAIL
testCountSmall (test.test_socket.SendfileUsingSendfileTest) ... ok
testCountWithOffset (test.test_socket.SendfileUsingSendfileTest) ... ok
testEmptyFileSend (test.test_socket.SendfileUsingSendfileTest) ... ok
testNonBlocking (test.test_socket.SendfileUsingSendfileTest) ... ok
testNonRegularFile (test.test_socket.SendfileUsingSendfileTest) ... ok
testOffset (test.test_socket.SendfileUsingSendfileTest) ... ERROR
testRegularFile (test.test_socket.SendfileUsingSendfileTest) ... ok
testWithTimeout (test.test_socket.SendfileUsingSendfileTest) ... FAIL
testWithTimeoutTriggeredSend (test.test_socket.SendfileUsingSendfileTest) ... ok
test_errors (test.test_socket.SendfileUsingSendfileTest) ... ok

======================================================================
ERROR: testOffset (test.test_socket.SendfileUsingSendfileTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/cpython/Lib/socket.py", line 386, in _sendfile_use_sendfile
    sent = os_sendfile(sockno, fileno, offset, blocksize)
OSError: [Errno 22] Invalid argument

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/cpython/Lib/test/test_socket.py", line 374, in _tearDown
    raise exc
  File "/root/cpython/Lib/test/test_socket.py", line 392, in clientRun
    test_func()
  File "/root/cpython/Lib/test/test_socket.py", line 6057, in _testOffset
    sent = meth(file, offset=5000)
  File "/root/cpython/Lib/socket.py", line 399, in _sendfile_use_sendfile
    raise _GiveupOnSendfile(err)
socket._GiveupOnSendfile: [Errno 22] Invalid argument

======================================================================
FAIL: testCount (test.test_socket.SendfileUsingSendfileTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/cpython/Lib/test/test_socket.py", line 6085, in testCount
    self.assertEqual(len(data), count)
AssertionError: 5405948743 != 5000007

======================================================================
FAIL: testWithTimeout (test.test_socket.SendfileUsingSendfileTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/cpython/Lib/test/test_socket.py", line 6159, in testWithTimeout
    self.assertEqual(len(data), self.FILESIZE)
AssertionError: 429006848 != 10485760

----------------------------------------------------------------------
Ran 11 tests in 33.603s

FAILED (failures=2, errors=1)


Looking at the testCount case I could observe repeated calls to sendfile() with out_fd=7, in_fd=6 off=0, len=5000007 which returned -1 with errno set to EAGAIN.
msg380348 - (view) Author: Jakub Stasiak (jstasiak) * Date: 2020-11-04 18:39
I thought I'd give it a shot and I believe i found the issue. Let's use the testCount test as an example.

The client side (or the data sending side) looks like this:

    def _testCount(self):
        address = self.serv.getsockname()
        file = open(os_helper.TESTFN, 'rb')
        sock = socket.create_connection(address,
                                        timeout=support.LOOPBACK_TIMEOUT)
        with sock, file:
            count = 5000007
            meth = self.meth_from_sock(sock)
            sent = meth(file, count=count)
            self.assertEqual(sent, count)
            self.assertEqual(file.tell(), count)

So we're sending 5000007 bytes of data at once to a socket that has a timeout of 5.0 seconds set (default in those tests).

Somewhere along the way socket._sendfile_use_sendfile() is called and in it there's a loop:

            try:
                while True:
                    (...)
                    try:
                        sent = os_sendfile(sockno, fileno, offset, blocksize)
                    except BlockingIOError:
                        if not timeout:
                            # Block until the socket is ready to send some
                            # data; avoids hogging CPU resources.
                            selector_select()
                        continue
                (...)
                return total_sent

On my test VM running openindiana 5.11 (I think that's the version number?) this is basically an infinite loop (I think it'll end at some point, but I didn't have the patience to verify this). That's because trying to send 5000007 bytes to that socket with 5 seconds timeout will trigger BlockingIOError. Why?

This is the relevant part of os.sendfile() implementation from posixmodule.c:

    do {
        Py_BEGIN_ALLOW_THREADS
        ret = sendfile(out_fd, in_fd, &offset, count);
        Py_END_ALLOW_THREADS
    } while (ret < 0 && errno == EINTR && !(async_err = PyErr_CheckSignals()));
    if (ret < 0)
        return (!async_err) ? posix_error() : NULL;
    return Py_BuildValue("n", ret);


offset is 0 in our case, but its exact value doesn't matter. The trouble is this is what illumos sendfile man page[1] says:


RETURN VALUES

       (...) In some error cases sendfile() may still write some
       data before encountering an error and returning -1.  When that occurs,
       off is updated to point to the byte that follows the last byte copied and
       should be compared with its value before calling sendfile() to determine
       how much data was sent.

After some input from Jakub Kulik I believe this is a unique behavior (Linux, Oracle Solaris, Mac OS, FreeBSD and likely all or almost all other OSes don't do this) and it lacks handling on the Python side.

I tested this and indeed the sendfile(out_fd, in_fd, &offset, count) call *does* do partial writes in our case which gets reported as EAGAIN which gets converted to BlockingIOError which makes socket._sendfile_use_sendfile() retry again and again, each time resending the very beginning of the data and not going any further, therefore accumulating a lot of garbage on the receiving socket's side.

This patch works for me and I run the whole test_socket test suite with it, no more failures:

diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c
index 203f98515dfd..eeff11b5b8ea 100644
--- a/Modules/posixmodule.c
+++ b/Modules/posixmodule.c
@@ -9475,13 +9475,20 @@ os_sendfile_impl(PyObject *module, int out_fd, int in_fd, PyObject *offobj,
    }
#endif

+    off_t original_offset = offset;
    do {
        Py_BEGIN_ALLOW_THREADS
        ret = sendfile(out_fd, in_fd, &offset, count);
        Py_END_ALLOW_THREADS
    } while (ret < 0 && errno == EINTR && !(async_err = PyErr_CheckSignals()));
-    if (ret < 0)
-        return (!async_err) ? posix_error() : NULL;
+    if (ret < 0) {
+       if (offset != original_offset) {
+           ret = offset - original_offset;
+       }
+       else {
+           return (!async_err) ? posix_error() : NULL;
+       }
+    }
    return Py_BuildValue("n", ret);
#endif
}

If it's verified to be a good change I'm happy to resubmit it in a PR with appropriate illumos-specific #ifndefs.


[1] https://illumos.org/man/3ext/sendfile
msg380356 - (view) Author: Sebastian Wiedenroth (wiedi) * Date: 2020-11-04 20:01
Excellent analysis, that's it! I've also tested your patch on SmartOS and it works great.
msg380358 - (view) Author: Jakub Kulik (kulikjak) * Date: 2020-11-04 20:34
I did some further digging, and this is indeed not possible on Oracle Solaris (that is, sendfile() cannot write bytes and fail during the same call). We considered this a bug and changed/fixed it several years ago.

Manual page doesn't mention that either:
https://docs.oracle.com/cd/E88353_01/html/E37843/sendfile-3c.html

I am not sure whether Illumos will want to change this as well (probably not since it is mentioned in the man page as expected behavior), but either way, the proposed change doesn't harm Oracle Solaris because 'if (offset != original_offset)' can never be true.
msg380362 - (view) Author: Jakub Stasiak (jstasiak) * Date: 2020-11-04 21:15
Thank you! I submitted a PR with a slightly modified patch (the comparison only happens on Solaris family of systems), I'd appreciate your confirmation that it still works (it's working for me on openindiana).
msg380810 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2020-11-12 09:49
New changeset fd4ed57674c675e05bd5d577dd5047a333c76c78 by Jakub Stasiak in branch 'master':
bpo-42237: Fix os.sendfile() on illumos (GH-23154)
https://github.com/python/cpython/commit/fd4ed57674c675e05bd5d577dd5047a333c76c78
msg380812 - (view) Author: miss-islington (miss-islington) Date: 2020-11-12 10:14
New changeset 7ae19ef5cf5d4f464588133bd48f81d80ea54aee by Miss Islington (bot) in branch '3.9':
bpo-42237: Fix os.sendfile() on illumos (GH-23154)
https://github.com/python/cpython/commit/7ae19ef5cf5d4f464588133bd48f81d80ea54aee
msg380817 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2020-11-12 12:24
New changeset f37628eb7117f222de24a6931aa7639e668cb7b0 by Jakub Stasiak in branch '3.8':
[3.8] bpo-42237: Fix os.sendfile() on illumos (GH-23154). (GH-23246)
https://github.com/python/cpython/commit/f37628eb7117f222de24a6931aa7639e668cb7b0
History
Date User Action Args
2020-11-12 12:46:46asvetlovsetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: + Python 3.8
2020-11-12 12:24:00asvetlovsetmessages: + msg380817
2020-11-12 11:09:09jstasiaksetpull_requests: + pull_request22143
2020-11-12 10:14:19miss-islingtonsetmessages: + msg380812
2020-11-12 09:50:04miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request22141
2020-11-12 09:49:35asvetlovsetnosy: + asvetlov
messages: + msg380810
2020-11-05 07:24:50kulikjaksetversions: + Python 3.9
2020-11-04 21:15:07jstasiaksetmessages: + msg380362
2020-11-04 21:12:39jstasiaksetpull_requests: + pull_request22066
2020-11-04 20:34:06kulikjaksetnosy: + kulikjak
messages: + msg380358
2020-11-04 20:01:31wiedisetmessages: + msg380356
2020-11-04 18:39:26jstasiaksetnosy: + jstasiak
messages: + msg380348
2020-11-01 20:21:33wiedisetkeywords: + patch
stage: patch review
pull_requests: + pull_request22002
2020-11-01 20:07:47wiedicreate