classification
Title: shutil.copyfileobj should internally use os.sendfile when possible
Type: enhancement Stage: resolved
Components: Library (Lib) Versions:
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: desbma, giampaolo.rodola, martin.panter, python-dev, r.david.murray, vstinner
Priority: normal Keywords:

Created on 2015-09-10 18:38 by desbma, last changed 2018-05-19 11:27 by giampaolo.rodola. This issue is now closed.

Messages (7)
msg250401 - (view) Author: desbma (desbma) * Date: 2015-09-10 18:38
Sorry if it has already been discussed, or if this is a stupid idea.

By looking at the signature, my thought was that the only use of shutil.copyfileobj was to wrap the use of sendfile, and use a fallback if it is not available on the system or not usable with "fake" Python files (not having a file descriptor, eg. like gzip.GzipFile).

By looking at the implementation, I was surprised that it does not try to call os.sendfile.
msg250406 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-09-10 19:10
No, it's to copy file like objects that aren't necessarily true file system objects.  The only requirement is that they support a minimal file-like object interface.  There is no suggestion or requirement that the object supports file descriptors (which would make it a real os file object).

I think changing it to use os.sendfile if possible would be too risky in terms of causing unexpected breakage (objects that look like they have a file descriptor but really don't), especially since handling file system files is explicitly not the purpose of copyfileobj.
msg250426 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-10 22:31
I agree it is a nice idea, but have the same concern as David. Many pseudo file objects wrap real file descriptors. E.g. BufferedReader hides unread data in a buffer, GzipFile decompresses data, HTTPResponse decodes chunks. Detecting when sendfile() is appropriate would be hard to do. And I understand some platforms only support sockets.

Also I notice that 3.5 has a new socket.sendfile() method (Issue 17552), which partly overlaps with this proposal. BTW the 3.4 os documentation should not mention the socket method, but it does!
msg250503 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-11 23:56
New changeset 3d9cbbad8a04 by Martin Panter <vadmium> in branch '3.4':
Issue #25063: socket.sendfile() does not exist in 3.4
https://hg.python.org/cpython/rev/3d9cbbad8a04
msg250578 - (view) Author: desbma (desbma) * Date: 2015-09-13 17:18
If I understand you correctly, a naive implementation like this:

use_chunk_copy = False
try:
  fdsrc = fsrc.fileno()
  fddst = fdst.fileno()
except OSError:
  # fallback on current chunk based Python copy
  use_chunk_copy = True
else:
  # we have 2 fd, try sendfile
  try:
    os.sendfile(fdsrc, fddst)
  except AttributeError:
    # sendfile not available
    use_chunk_copy = True
if use_chunk_copy:
  # use current Python based chunk copy

...would not work because some file like objects expose fileno(), even though it does not accurately represent the file because they wrap it to add special behavior (like GzipFile).

I don't understand, if they alter the file behavior, why do they expose fileno()?
msg250586 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-13 20:50
I don’t know the exact reasoning, but fileno() could be useful for other purposes, e.g. passing to stat() to find the last modification time, passing to select() to see if data is available.
msg250657 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-09-14 12:02
I'm going to reject this.  If you think there is enough value in this to warrant the complications that would be required to ensure backward compatibility (assuming that is even possible), you could bring it up again on python-ideas.
History
Date User Action Args
2018-05-19 11:27:44giampaolo.rodolasetnosy: + giampaolo.rodola
2015-09-14 12:02:59r.david.murraysetstatus: open -> closed
resolution: rejected
messages: + msg250657

stage: resolved
2015-09-13 20:50:40martin.pantersetmessages: + msg250586
2015-09-13 20:22:19vstinnersetnosy: + vstinner
2015-09-13 17:18:00desbmasetmessages: + msg250578
2015-09-11 23:56:37python-devsetnosy: + python-dev
messages: + msg250503
2015-09-10 22:31:57martin.pantersettype: enhancement

messages: + msg250426
nosy: + martin.panter
2015-09-10 19:10:04r.david.murraysetnosy: + r.david.murray
messages: + msg250406
2015-09-10 18:38:50desbmacreate