This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author giampaolo.rodola
Recipients StyXman, desbma, facundobatista, giampaolo.rodola, martin.panter, ncoghlan, neologix, petr.viktorin, python-dev, r.david.murray, vstinner
Date 2018-05-24.20:15:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1527192929.89.0.682650639539.issue33639@psf.upfronthosting.co.za>
In-reply-to
Content
This is a follow up of #25063 and similar to socket.sendfile() (#17552). It provides a 20/25% speedup when copying files with shutil.copyfile(), shutil.copy() and shutil.copy2(). Differently from #25063 this is used for filesystem files only and copyfileobj() is left alone.

Unmerged #26826 is also related to this. I applied #26826 patch and built a wrapper around copy_file_range() and the speedup is basically the same. Nevertheless, even when #26826 gets merged it probably makes sense to rely on sendfile() in case copy_file_range() is not available (it was introduced in 2016) or the UNIX platform supports file-to-file copy via sendfile(2) (even though I'm not aware of any, so this would basically be Linux-only).

Some benchmarks:

    $ dd if=/dev/urandom of=/tmp/f1 bs=1K count=128
    $ time ./python -m timeit -s 'import shutil; p1 = "/tmp/f1"; p2 = "/tmp/f2"' 'shutil.copyfile(p1, p2)'

128K copy
=========

--- without patch:

    2000 loops, best of 5: 160 usec per loop

    real    0m2.353s
    user    0m0.454s
    sys     0m1.435s

--- with patch:

    2000 loops, best of 5: 187 usec per loop

    real    0m2.724s
    user    0m0.627s
    sys     0m1.634s

8MB copy
========

$ dd if=/dev/urandom of=/tmp/f1 bs=1M count=8

--- without patch:

    50 loops, best of 5: 9.51 msec per loop

    real    0m3.392s
    user    0m0.343s
    sys     0m2.478s

--- with patch:

    50 loops, best of 5: 7.75 msec per loop

    real    0m2.878s
    user    0m0.105s
    sys     0m2.187s

512MB copy
==========

--- without patch:

    1 loop, best of 5: 872 msec per loop

    real    0m5.574s
    user    0m0.402s
    sys     0m3.115s

--- with patch:

    1 loop, best of 5: 646 msec per loop

    real    0m5.475s
    user    0m0.037s
    sys     0m2.959s
History
Date User Action Args
2018-05-24 20:15:29giampaolo.rodolasetrecipients: + giampaolo.rodola, facundobatista, ncoghlan, vstinner, StyXman, r.david.murray, petr.viktorin, neologix, python-dev, martin.panter, desbma
2018-05-24 20:15:29giampaolo.rodolasetmessageid: <1527192929.89.0.682650639539.issue33639@psf.upfronthosting.co.za>
2018-05-24 20:15:29giampaolo.rodolalinkissue33639 messages
2018-05-24 20:15:29giampaolo.rodolacreate