Author vstinner
Recipients giampaolo.rodola, pablogsal, vstinner
Date 2019-06-04.21:36:40
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1559684200.61.0.18008315195.issue37157@roundup.psfhosted.org>
In-reply-to
Content
bpo-26826 added a new os.copy_file_range() function:
https://docs.python.org/dev/library/os.html#os.copy_file_range

As os.sendfile(), this new Linux syscall avoids memory copies between kernel space and user space. It matters for performance, especially since Meltdown vulnerability required Windows, Linux, FreeBSD, etc. to use a different address space for the kernel (like Linux Kernel page-table isolation, KPTI).

shutil has been modified in Python 3.8 to use os.sendfile() on Linux:
https://docs.python.org/dev/whatsnew/3.8.html#optimizations

But according to Pablo Galindo Salgado, copy_file_range() goes further:
"But copy_file_rane can leverage more filesystem features like deduplication and copy offload stuff."

https://bugs.python.org/issue26826#msg344582

Giampaolo Rodola' added:

"I think data deduplication / CoW / reflink copy is better implemented via FICLONE. "cp --reflink" uses it, I presume because it's older than copy_file_range(). I have a working patch adding CoW copy support for Linux and OSX (but not Windows). I think that should be exposed as a separate shutil.reflink() though, and copyfile() should just do a standard copy."

"Actually "man copy_file_range" claims it can do server-side copy, meaning no network traffic between client and server if *src* and *dst* live on the same network fs. So I agree copy_file_range() should be preferred over sendfile() after all. =)
I have a wrapper for copy_file_range() similar to what I did in shutil in issue33671 which I can easily integrate, but I wanted to land this one first:
https://bugs.python.org/issue37096
Also, I suppose we cannot land this in time for 3.8?"

https://bugs.python.org/issue26826#msg344586

--

There was already a discussion about switching shutil to copy-on-write:
https://bugs.python.org/issue33671#msg317989

One problem is that modifying the "copied" file can suddenly become slower if it was copied using "cp --reflink".

It seems like adding a new reflink=False parameter to file copy functions to control clone/CoW copies is required to prevent bad surprises.
History
Date User Action Args
2019-06-04 21:36:40vstinnersetrecipients: + vstinner, giampaolo.rodola, pablogsal
2019-06-04 21:36:40vstinnersetmessageid: <1559684200.61.0.18008315195.issue37157@roundup.psfhosted.org>
2019-06-04 21:36:40vstinnerlinkissue37157 messages
2019-06-04 21:36:40vstinnercreate