This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: shutil._fastcopy_sendfile() makes wrong (?) assumption about sendfile() return value
Type: Stage:
Components: Library (Lib) Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: andrei.avk, lhuedepohl
Priority: normal Keywords:

Created on 2021-03-03 16:06 by lhuedepohl, last changed 2022-04-11 14:59 by admin.

Messages (3)
msg388027 - (view) Author: Lorenz Hüdepohl (lhuedepohl) Date: 2021-03-03 16:06
Since version 3.8, the shutil.copyfile() function tries to make use of os.sendfile(). Currently, this is done in a loop that aborts as soon as sendfile() returns 0, as it is assumed that this signifies the end of the file.

The problem: the return value of sendfile() is simply the amount of bytes that could be successfully copied in that one syscall, with no guarantee that a return value of 0 only happens when there is truly no data left to copy. Some other urgent task could have interrupted the kernel before any data was copied. At least, I could not find documentation to the contrary.

(Note: This might or might not be actual behavior of current Linux kernels today, but the spec of sendfile() certainly allows this)


In any case, in that same routine the size of the source file is anyway requested in an os.fstat() call, one could thus restructure the loop like this, for example:


filesize = os.fstat(infd).st_size
offset = 0
while offset < filesize:
   sent = os.sendfile(outfd, infd, offset, blocksize)
   offset += sent

(Error handling etc. left out for clarity, just to point out the new structure)

This would also optimize the case of an empty input file, in that case the loop is never entered and no os.sendfile() call is issued, at all.

In the normal case, it would also save the unnecessary last os.sendfile() call, when 'offset' has already grown to 'filesize'. (This was the actual reason for me to look into this in the first place, a filesystem bug where sendfile() called with an offset set to the file's size returns "EAGAIN" in certain cases. But this is another topic entirely and has nothing to do with Python, of course.)


Note that in Modules/posixmodule.c os_sendfile_impl() there is also another loop around individual actual sendfile() system call, but a return value of 0 there would also exit that loop and be passed up:

    do {
        Py_BEGIN_ALLOW_THREADS
#ifdef __APPLE__
        ret = sendfile(in_fd, out_fd, offset, &sbytes, &sf, flags);
#else
        ret = sendfile(in_fd, out_fd, offset, count, &sf, &sbytes, flags);
#endif
        Py_END_ALLOW_THREADS
    } while (ret < 0 && errno == EINTR && !(async_err = PyErr_CheckSignals()));


Kind regards,
  Lorenz
msg396995 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-07-05 15:19
Wouldn't this create infinite loop if a file cannot be fully copied for some reason?
msg396996 - (view) Author: Lorenz Hüdepohl (lhuedepohl) Date: 2021-07-05 15:28
Sure, as written it would. As I said: Error handling etc. left out for clarity, just to point out the new structure.
History
Date User Action Args
2022-04-11 14:59:42adminsetgithub: 87554
2021-07-05 15:28:45lhuedepohlsetmessages: + msg396996
2021-07-05 15:19:26andrei.avksetnosy: + andrei.avk
messages: + msg396995
2021-03-03 16:06:53lhuedepohlcreate