New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) #77852
Comments
Patch in attachment uses platform specific zero-copy syscalls on Linux and Solaris (os.sendfile(2)), Windows (CopyFileW) and OSX (fcopyfile(2)) speeding up shutil.copyfile() and other functions using it (copy(), copy2(), copytree(), move()). Average speedup for a 512MB file copy is +24% on Linux, +50% on OSX and +48% on Windows by copying file on the same partition (SSD disk was used). Follows some benchmarks. Setup Create 128K, 8M, 512M file: $ python -c "import os; f = open('f1', 'wb'); f.write(os.urandom(128 * 1024))"
$ python -c "import os; f = open('f1', 'wb'); f.write(os.urandom(8 * 1024 * 1024))"
$ python -c "import os; f = open('f1', 'wb'); f.write(os.urandom(512 * 1024 * 1024))" Benchmark: $ time ./python -m timeit -s 'import shutil; p1 = "f1"; p2 = "f2"' 'shutil.copyfile(p1, p2)' Linux 128K copy (+13%):
with patch:
1000 loops, best of 5: 198 usec per loop
real 0m1.464s
user 0m0.281s
sys 0m0.958s 8MB copy (+24%):
with patch:
50 loops, best of 5: 7.78 msec per loop
real 0m2.447s
user 0m0.086s
sys 0m1.682s 512MB copy (+26%):
with patch:
1 loop, best of 5: 646 msec per loop
real 0m5.475s
user 0m0.037s
sys 0m2.959s OSX 128K copy (+8.5%):
with patch:
500 loops, best of 5: 464 usec per loop
real 0m2.798s
user 0m0.379s
sys 0m2.031s 8MB copy (+67%):
with patch:
20 loops, best of 5: 10.8 msec per loop
real 0m1.860s
user 0m0.079s
sys 0m0.719s 512MB copy (+50%):
Windows 128K copy (+69%):
8M copy (+64%):
512M copy (+48%):
|
PR: #7160 |
Nice, I really like this. Apart from the usual bit of minor style issues, I couldn't see anything inherently wrong with the PR, but I'll leave the detailed reviews to those who'd have to maintain the code in the future. :) |
Regarding the benchmarks, just to be sure, did you try reversing the run order to make sure you don't get unfair caching effects for the later runs? |
http://man7.org/linux/man-pages/man2/ioctl_ficlonerange.2.html That possibly should be used under Linux in order to really acheive zero-copying. Just like modern cp command. |
Yes, I tried changing benchmarks order and zero-copy variants are always faster. As for instantaneous CoW copy, it is debatable. E.g. "cp" command does not do it by default: |
For future reference, as per #7160 discussion, we decided not to use CopyFileEx on Windows and instead increase read() buffer size from 16KB to 1MB (Windows only) resulting in a 40.8% speedup (instead of 48%). Also copyfileobj() has been optimized on all platforms by using readinto()/memoryview()/bytearray(). 128KB copy (+27%)
8MB copy (+45.6%)
512MB copy (+40.8%)
|
Thanks Gianpaolo for pushing for this. Great job. |
I concur: great job! Cool optimization. |
shutil.COPY_BUFSIZE isn't documented. Is it a deliberate choice? |
Yes, it's deliberate, see PR-12016. |
"I decided not to document Ok. I have no opinion on that, I just wanted to ask the question :-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: