New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase shutil.COPY_BUFSIZE #80284
Comments
shutil.COPY_BUFSIZE is 16KiB on non-Windows platform. As this article[1], 128KiB is the best performance on common system. Another resource: EBS document [2] uses 128KiB I/O for throughput. Can we increase shutil.COPY_BUFSIZE to 128KiB by default? Note that 128KiB is small enough when comparing with Windows (1MB by default). |
Your first link explains why 128kB buffer size is faster in the context of cp: it's due to fadvise and kernel read ahead. None of the shutil functions call fadvise, so the benchmark and conclusions are irrelevant to the Python buffer size IMO. In general, the bigger buffer, the better, to reduce syscall frequency (also explained in the article), but going from 16kB to 128kB is clearly in the micro optimization range, unlikely to do any significant difference. Also with 3.8, in many typical file copy cases (but not all), sendfile will be used, which makes buffer size even less important. |
Even without fadvice, readahead works automatically. fadvice doubles readahead size on Linux. But I don't know it really doubles readahead size when block device advertised readahead size.
It is used for copyfileobj. So better default value may worth enough. In my Linux box, SATA SSD (Samsung SSD 500GB 860EVO) is used. Here is quick test: $ dd if=/dev/urandom of=f1 bs=1M count=1k
$ ./python -m timeit -n1 -r5 -v -s 'import shutil;' -- 'f1=open("f1","rb"); f2=open("/dev/null", "wb"); shutil.copyfileobj(f1, f2, 8*1024); f1.close(); f2.close()'
raw times: 301 msec, 302 msec, 301 msec, 301 msec, 300 msec 1 loop, best of 5: 300 msec per loop $ ./python -m timeit -n1 -r5 -v -s 'import shutil;' -- 'f1=open("f1","rb"); f2=open("/dev/null", "wb"); shutil.copyfileobj(f1, f2, 16*1024); f1.close(); f2.close()'
raw times: 194 msec, 194 msec, 193 msec, 193 msec, 193 msec 1 loop, best of 5: 193 msec per loop $ ./python -m timeit -n1 -r5 -v -s 'import shutil;' -- 'f1=open("f1","rb"); f2=open("/dev/null", "wb"); shutil.copyfileobj(f1, f2, 32*1024); f1.close(); f2.close()'
raw times: 140 msec, 140 msec, 140 msec, 140 msec, 140 msec 1 loop, best of 5: 140 msec per loop $ ./python -m timeit -n1 -r5 -v -s 'import shutil;' -- 'f1=open("f1","rb"); f2=open("/dev/null", "wb"); shutil.copyfileobj(f1, f2, 64*1024); f1.close(); f2.close()'
raw times: 112 msec, 112 msec, 112 msec, 112 msec, 112 msec 1 loop, best of 5: 112 msec per loop $ ./python -m timeit -n1 -r5 -v -s 'import shutil;' -- 'f1=open("f1","rb"); f2=open("/dev/null", "wb"); shutil.copyfileobj(f1, f2, 128*1024); f1.close(); f2.close()'
raw times: 101 msec, 101 msec, 101 msec, 101 msec, 101 msec As far as this result, I think 64KiB is the best balance. |
If you do a benchmark by reading from a file, and then writing to /dev/null several times, without clearing caches, you are measuring *only* the syscall overhead:
Your current command line also measures open/close timings, without that I think the speed should linearly increase when doubling buffer size, but of course this is misleading, because its a synthetic benchmark. Also if you clear caches in between tests, and write the output file to the SSD itself, sendfile will be used, and should be even faster. So again I'm not sure this means much compared to real world usage. |
Yes. I measures syscall overhead to determine reasonable buffer size.
As I said before, my SSD doesn't have stable write performance. (It
I'm not measuring speed of my cheap SSD. The goal of this benchmark is finding
No. sendfile is not used by shutil.copyfileobj, even if dst is real
"Real world usage" is vary. Sometime it is not affected. Sometime it affects. On the other hand, what is the cons of changing 16KiB to 64KiB? |
@inada: having played with this in the past I seem to remember that on Linux the bigger bufsize doesn't make a reasonable difference (but I may be wrong), that's why I suggest to try some benchmarks. In bpo-33671 I pasted some one-liners you can use (and you should target copyfileobj() instead of copyfile() in order to skip the os.sendfile() path). Also on Linux "echo 3 | sudo tee /proc/sys/vm/drop_caches" is supposed to disable the cache. |
As I said already, shutil is not used only with cold cache. If cache is cold, disk speed will be upper bound in most cases. |
Read this file too. coreutils choose 128KiB for *minimal* buffer size to reduce syscall overhead. I think 128KiB is the best, but I'm OK to 64KiB for conservative decision. |
I chose 64 KiB because performance difference between 64 and 128 KiB |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: