classification
Title: Call readinto in shutil.copyfileobj
Type: performance Stage: resolved
Components: Versions: Python 3.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: YoSTEALTH, christian.heimes, martin.panter, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2018-01-10 22:11 by YoSTEALTH, last changed 2018-01-17 18:07 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5158 closed YoSTEALTH, 2018-01-11 13:18
PR 5147 closed YoSTEALTH, 2018-01-11 13:42
Messages (7)
msg309784 - (view) Author: (YoSTEALTH) * Date: 2018-01-10 22:11
improved "copyfileobj" function to use less memory
msg309787 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2018-01-11 00:58
Looks like you want to use a "readinto" method to reduce data copying.

One problem is that it is not specified exactly what kind of object "copyfileobj" supports reading from. The documentation only says "file-like". According to the glossary, this means io.RawIOBase, BufferedIOBase, or TextIOBase. However TextIOBase doesn't have a "readinto" method. And it wouldn't be hard to find that someone has written their own class that doesn't have "readinto" either.

The other problem is you still need to support a negative "length" value, which is easier to do by calling "read".
msg309803 - (view) Author: (YoSTEALTH) * Date: 2018-01-11 12:17
Martin, your points got me thinking... to make a proper copy of a file, it should be done using bytes! Text IO could easily lead to corrupting your file.

for example (current function):
    with open(old_path, 'r', encoding='latin-1') as fsrc:
        with open(new_path, 'w') as fdst:
            copyfileobj(fsrc, fdst)

This would lead you to have wrongly encoded file with different filesize.
msg309804 - (view) Author: (YoSTEALTH) * Date: 2018-01-11 13:32
Ok, updated the patch to account for:
- improved memory usage for bytes io using readinto
- still supporting negative length
- potential encoding mismatch bug fix while using text io

did i miss anything?
msg309807 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-01-11 14:02
Please provide benchmark results to demonstrate the benefits of this change.
msg309808 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018-01-11 14:16
I already blocked the PR with a request for benchmarks and proper tests.
msg309817 - (view) Author: (YoSTEALTH) * Date: 2018-01-11 17:36
here is the links to benchmark:
    https://repl.it/@altendky/timeit-of-proposed-shutilfileobjcopy
    https://gist.github.com/altendky/ff5ccee2baf9822dce69ae8aa66a0fdf
    https://paste.pound-python.org/show/urORPXztcbDlqXKTORAj/  # time ./file.py

the problem is the result times are so sparitic that not sure what to post as the result! I have tried with actual file as well and the results are all over the place.

If any of you out there are can do benchmark that can produce accurate results and post the result it would help.
History
Date User Action Args
2018-01-17 18:07:32serhiy.storchakasetresolution: rejected
2018-01-17 18:04:42YoSTEALTHsetstatus: open -> closed
stage: patch review -> resolved
2018-01-11 17:36:26YoSTEALTHsetmessages: + msg309817
2018-01-11 14:16:50christian.heimessetnosy: + christian.heimes
messages: + msg309808
2018-01-11 14:02:58serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg309807
2018-01-11 13:42:36YoSTEALTHsetpull_requests: + pull_request5014
2018-01-11 13:32:15YoSTEALTHsetmessages: + msg309804
2018-01-11 13:26:06YoSTEALTHsetpull_requests: - pull_request5004
2018-01-11 13:18:01YoSTEALTHsetkeywords: + patch
stage: patch review
pull_requests: + pull_request5013
2018-01-11 12:17:11YoSTEALTHsetmessages: + msg309803
2018-01-11 00:58:03martin.pantersetnosy: + martin.panter

messages: + msg309787
title: improved shutil.py function -> Call readinto in shutil.copyfileobj
2018-01-10 22:11:02YoSTEALTHcreate