This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author methane
Recipients desbma, methane
Date 2019-02-25.15:52:19
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1551109939.9.0.780725143751.issue36103@roundup.psfhosted.org>
In-reply-to
Content
> Your first link explains why 128kB buffer size is faster in the context of cp: it's due to fadvise and kernel read ahead.
> 
> None of the shutil functions call fadvise, so the benchmark and conclusions are irrelevant to the Python buffer size IMO.

Even without fadvice, readahead works automatically.  fadvice doubles readahead size on Linux.  But I don't know it really doubles readahead size when block device advertised readahead size.


> In general, the bigger buffer, the better, to reduce syscall frequency (also explained in the article), but going from 16kB to 128kB is clearly in the micro optimization range, unlikely to do any significant difference.
>
> Also with 3.8, in many typical file copy cases (but not all), sendfile will be used, which makes buffer size even less important.

It is used for copyfileobj.  So better default value may worth enough.

In my Linux box, SATA SSD (Samsung SSD 500GB 860EVO) is used.
It has unstable sequential write performance.

Here is quick test:

$ dd if=/dev/urandom of=f1 bs=1M count=1k

$ ./python -m timeit -n1 -r5 -v -s 'import shutil;' -- 'f1=open("f1","rb"); f2=open("/dev/null", "wb"); shutil.copyfileobj(f1, f2, 8*1024); f1.close(); f2.close()'
raw times: 301 msec, 302 msec, 301 msec, 301 msec, 300 msec

1 loop, best of 5: 300 msec per loop

$ ./python -m timeit -n1 -r5 -v -s 'import shutil;' -- 'f1=open("f1","rb"); f2=open("/dev/null", "wb"); shutil.copyfileobj(f1, f2, 16*1024); f1.close(); f2.close()'
raw times: 194 msec, 194 msec, 193 msec, 193 msec, 193 msec

1 loop, best of 5: 193 msec per loop

$ ./python -m timeit -n1 -r5 -v -s 'import shutil;' -- 'f1=open("f1","rb"); f2=open("/dev/null", "wb"); shutil.copyfileobj(f1, f2, 32*1024); f1.close(); f2.close()'
raw times: 140 msec, 140 msec, 140 msec, 140 msec, 140 msec

1 loop, best of 5: 140 msec per loop

$ ./python -m timeit -n1 -r5 -v -s 'import shutil;' -- 'f1=open("f1","rb"); f2=open("/dev/null", "wb"); shutil.copyfileobj(f1, f2, 64*1024); f1.close(); f2.close()'
raw times: 112 msec, 112 msec, 112 msec, 112 msec, 112 msec

1 loop, best of 5: 112 msec per loop

$ ./python -m timeit -n1 -r5 -v -s 'import shutil;' -- 'f1=open("f1","rb"); f2=open("/dev/null", "wb"); shutil.copyfileobj(f1, f2, 128*1024); f1.close(); f2.close()'
raw times: 101 msec, 101 msec, 101 msec, 101 msec, 101 msec


As far as this result, I think 64KiB is the best balance.
History
Date User Action Args
2019-02-25 15:52:19methanesetrecipients: + methane, desbma
2019-02-25 15:52:19methanesetmessageid: <1551109939.9.0.780725143751.issue36103@roundup.psfhosted.org>
2019-02-25 15:52:19methanelinkissue36103 messages
2019-02-25 15:52:19methanecreate