Efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) #77852

giampaolo · 2018-05-28T16:17:24Z

BPO	33671
Nosy	@facundobatista, @ncoghlan, @pitrou, @scoder, @vstinner, @giampaolo, @tarekziade, @merwok, @bitdancer, @encukou, @asvetlov, @methane, @socketpair, @vadmium, @MojoVampire, @asottile
PRs	bpo-33671: efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) #7160 bpo-33671 / shutil.copyfile: use memoryview() with dynamic size on Windows #7681 bpo-33671: allow setting shutil.copyfile() bufsize globally #12016 bpo-33671 fix orphaned comment in shutil.copyfileobj #27516
Files	shutil-zero-copy.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2018-06-12.21:48:06.592>
created_at = <Date 2018-05-28.16:17:23.634>
labels = ['3.8', 'library', 'performance']
title = 'Efficient zero-copy for shutil.copy* functions (Linux, OSX and Win)'
updated_at = <Date 2021-07-31.19:15:48.402>
user = 'https://github.com/giampaolo'

bugs.python.org fields:

activity = <Date 2021-07-31.19:15:48.402>
actor = 'eric.araujo'
assignee = 'none'
closed = True
closed_date = <Date 2018-06-12.21:48:06.592>
closer = 'giampaolo.rodola'
components = ['Library (Lib)']
creation = <Date 2018-05-28.16:17:23.634>
creator = 'giampaolo.rodola'
dependencies = []
files = ['47621']
hgrepos = []
issue_num = 33671
keywords = ['patch', 'needs review']
message_count = 18.0
messages = ['317878', '317880', '317905', '317906', '317932', '317989', '319401', '319405', '319484', '319486', '319980', '320247', '320456', '336492', '336502', '336514', '336516', '398660']
nosy_count = 21.0
nosy_names = ['facundobatista', 'ncoghlan', 'pitrou', 'scoder', 'vstinner', 'giampaolo.rodola', 'gps', 'StyXman', 'tarek', 'eric.araujo', 'r.david.murray', 'petr.viktorin', 'asvetlov', 'methane', 'SilentGhost', 'neologix', 'socketpair', 'python-dev', 'martin.panter', 'josh.r', 'Anthony Sottile']
pr_nums = ['7160', '7681', '12016', '27516']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue33671'
versions = ['Python 3.8']

giampaolo · 2018-05-28T16:17:23Z

Patch in attachment uses platform specific zero-copy syscalls on Linux and Solaris (os.sendfile(2)), Windows (CopyFileW) and OSX (fcopyfile(2)) speeding up shutil.copyfile() and other functions using it (copy(), copy2(), copytree(), move()).

Average speedup for a 512MB file copy is +24% on Linux, +50% on OSX and +48% on Windows by copying file on the same partition (SSD disk was used).

Follows some benchmarks.

Setup
=====

Create 128K, 8M, 512M file:

    $ python -c "import os; f = open('f1', 'wb'); f.write(os.urandom(128 * 1024))"
    $ python -c "import os; f = open('f1', 'wb'); f.write(os.urandom(8 * 1024 * 1024))"
    $ python -c "import os; f = open('f1', 'wb'); f.write(os.urandom(512 * 1024 * 1024))"

Benchmark:

    $ time ./python -m timeit -s 'import shutil; p1 = "f1"; p2 = "f2"' 'shutil.copyfile(p1, p2)'

Linux
=====

128K copy (+13%):

without patch:
    1000 loops, best of 5: 228 usec per loop
    real    0m1.756s
    user    0m0.386s
    sys     0m1.116s

    with patch:
        1000 loops, best of 5: 198 usec per loop
        real    0m1.464s
        user    0m0.281s
        sys     0m0.958s

8MB copy (+24%):

without patch:
    50 loops, best of 5: 10.1 msec per loop
    real    0m2.703s
    user    0m0.316s
    sys     0m1.847s

    with patch:
        50 loops, best of 5: 7.78 msec per loop
        real    0m2.447s
        user    0m0.086s
        sys     0m1.682s

512MB copy (+26%):

without patch:
    1 loop, best of 5: 872 msec per loop
    real    0m5.574s
    user    0m0.402s
    sys     0m3.115s

    with patch:
        1 loop, best of 5: 646 msec per loop
        real    0m5.475s
        user    0m0.037s
        sys     0m2.959s

OSX
===

128K copy (+8.5%):

without patch:
    500 loops, best of 5: 508 usec per loop
    real    0m2.971s
    user    0m0.442s
    sys     0m2.168s

    with patch:
        500 loops, best of 5: 464 usec per loop
        real    0m2.798s
        user    0m0.379s
        sys     0m2.031s

8MB copy (+67%):

without patch:
    20 loops, best of 5: 32.8 msec per loop
    real    0m3.672s
    user    0m0.357s
    sys     0m1.434s

    with patch:
        20 loops, best of 5: 10.8 msec per loop
        real    0m1.860s
        user    0m0.079s
        sys     0m0.719s

512MB copy (+50%):

without patch:
    1 loop, best of 5: 953 msec per loop
    real    0m5.930s
    user    0m1.021s
    sys     0m4.835s

with patch:
    1 loop, best of 5: 480 msec per loop
    real    0m3.150s
    user    0m0.067s
    sys     0m2.740s

Windows
=======

128K copy (+69%):

without patch:
    50 loops, best of 5: 6.45 msec per loop
with patch:
    50 loops, best of 5: 1.99 msec per loop

8M copy (+64%):

without patch:
    10 loops, best of 5: 22.6 msec per loop
with patch:
    50 loops, best of 5: 7.95 msec per loop

512M copy (+48%):

without patch:
    1 loop, best of 5: 1.21 sec per loop
with patch:
    1 loop, best of 5: 629 msec per loop

giampaolo · 2018-05-28T16:22:16Z

PR: #7160

scoder · 2018-05-28T19:36:41Z

Nice, I really like this.

Apart from the usual bit of minor style issues, I couldn't see anything inherently wrong with the PR, but I'll leave the detailed reviews to those who'd have to maintain the code in the future. :)

scoder · 2018-05-28T19:38:51Z

Regarding the benchmarks, just to be sure, did you try reversing the run order to make sure you don't get unfair caching effects for the later runs?

socketpair · 2018-05-28T22:05:47Z

http://man7.org/linux/man-pages/man2/ioctl_ficlonerange.2.html

That possibly should be used under Linux in order to really acheive zero-copying. Just like modern cp command.

giampaolo · 2018-05-29T08:43:22Z

Yes, I tried changing benchmarks order and zero-copy variants are always faster. As for instantaneous CoW copy, it is debatable. E.g. "cp" command does not do it by default:
https://unix.stackexchange.com/questions/80351/why-is-cp-reflink-auto-not-the-default-behaviour
I think shutil should follow the same lead, and perhaps provide a cow=bool argument in the future.

giampaolo · 2018-06-12T21:04:58Z

New changeset 4a172cc by Giampaolo Rodola in branch 'master':
bpo-33671: efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) (bpo-7160)
4a172cc

giampaolo · 2018-06-12T21:48:06Z

For future reference, as per #7160 discussion, we decided not to use CopyFileEx on Windows and instead increase read() buffer size from 16KB to 1MB (Windows only) resulting in a 40.8% speedup (instead of 48%). Also copyfileobj() has been optimized on all platforms by using readinto()/memoryview()/bytearray().
Updated benchmarks on Windows:

128KB copy (+27%)

without patch:
    50 loops, best of 5: 7.69 sec per loop
with patch:
    50 loops, best of 5: 5.61 sec per loop

8MB copy (+45.6%)

without patch:
    10 loops, best of 5: 20.8 sec per loop
with patch:
    20 loops, best of 5: 11.3 sec per loop

512MB copy (+40.8%)

without patch:
    1 loop, best of 5: 1.26 sec per loop
with patch:
    1 loop, best of 5: 646 msec per loop

StyXman · 2018-06-13T19:50:35Z

Thanks Gianpaolo for pushing for this. Great job.

vstinner · 2018-06-13T21:20:16Z

Thanks Gianpaolo for pushing for this. Great job.

I concur: great job! Cool optimization.

giampaolo · 2018-06-19T15:27:33Z

New changeset c7f02a9 by Giampaolo Rodola in branch 'master':
bpo-33671 / shutil.copyfile: use memoryview() with dynamic size on Windows (bpo-7681)
c7f02a9

vstinner · 2018-06-22T17:25:47Z

New changeset 8fbbdf0 by Victor Stinner in branch 'master':
bpo-33671: Add support.MS_WINDOWS and support.MACOS (GH-7800)
8fbbdf0

vstinner · 2018-06-26T00:11:09Z

New changeset 937ee9e by Victor Stinner in branch 'master':
Revert "bpo-33671: Add support.MS_WINDOWS and support.MACOS (GH-7800)" (GH-7919)
937ee9e

methane · 2019-02-24T23:46:45Z

New changeset 3b0abb0 by Inada Naoki (Giampaolo Rodola) in branch 'master':
bpo-33671: allow setting shutil.copyfile() bufsize globally (GH-12016)
3b0abb0

vstinner · 2019-02-25T08:11:47Z

shutil.COPY_BUFSIZE isn't documented. Is it a deliberate choice?
https://docs.python.org/dev/library/shutil.html

giampaolo · 2019-02-25T12:12:31Z

Yes, it's deliberate, see PR-12016.

vstinner · 2019-02-25T12:44:25Z

Yes, it's deliberate, see PR-12016.

"I decided not to document shutil.COPY_BUFSIZE because I consider it a corner case."

Ok. I have no opinion on that, I just wanted to ask the question :-)

merwok · 2021-07-31T19:15:48Z

New changeset b08c48e by Anthony Sottile in branch 'main':
bpo-33671 fix orphaned comment in shutil.copyfileobj (GH-27516)
b08c48e

giampaolo added 3.8 only security fixes stdlib Python modules in the Lib dir performance Performance or resource usage labels May 28, 2018

giampaolo changed the title ~~Efficient efficient zero-copy syscalls for shutil.copy* functions (Linux, OSX and Win)~~ Efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) May 28, 2018

giampaolo closed this as completed Jun 12, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) #77852

Efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) #77852

giampaolo commented May 28, 2018

giampaolo commented May 28, 2018

giampaolo commented May 28, 2018

scoder commented May 28, 2018

scoder commented May 28, 2018

socketpair mannequin commented May 28, 2018

giampaolo commented May 29, 2018

giampaolo commented Jun 12, 2018

giampaolo commented Jun 12, 2018

StyXman mannequin commented Jun 13, 2018

vstinner commented Jun 13, 2018

giampaolo commented Jun 19, 2018

vstinner commented Jun 22, 2018

vstinner commented Jun 26, 2018

methane commented Feb 24, 2019

vstinner commented Feb 25, 2019

giampaolo commented Feb 25, 2019

vstinner commented Feb 25, 2019

merwok commented Jul 31, 2021

Efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) #77852

Efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) #77852

Comments

giampaolo commented May 28, 2018

giampaolo commented May 28, 2018

giampaolo commented May 28, 2018

scoder commented May 28, 2018

scoder commented May 28, 2018

socketpair mannequin commented May 28, 2018

giampaolo commented May 29, 2018

giampaolo commented Jun 12, 2018

giampaolo commented Jun 12, 2018

StyXman mannequin commented Jun 13, 2018

vstinner commented Jun 13, 2018

giampaolo commented Jun 19, 2018

vstinner commented Jun 22, 2018

vstinner commented Jun 26, 2018

methane commented Feb 24, 2019

vstinner commented Feb 25, 2019

giampaolo commented Feb 25, 2019

vstinner commented Feb 25, 2019

merwok commented Jul 31, 2021