Author vstinner
Recipients giampaolo.rodola, pablogsal, vstinner
Date 2019-06-04.21:36:40
bpo-26826 added a new os.copy_file_range() function:

As os.sendfile(), this new Linux syscall avoids memory copies between kernel space and user space. It matters for performance, especially since Meltdown vulnerability required Windows, Linux, FreeBSD, etc. to use a different address space for the kernel (like Linux Kernel page-table isolation, KPTI).

shutil has been modified in Python 3.8 to use os.sendfile() on Linux:

But according to Pablo Galindo Salgado, copy_file_range() goes further:
"But copy_file_rane can leverage more filesystem features like deduplication and copy offload stuff."

Giampaolo Rodola' added:

"I think data deduplication / CoW / reflink copy is better implemented via FICLONE. "cp --reflink" uses it, I presume because it's older than copy_file_range(). I have a working patch adding CoW copy support for Linux and OSX (but not Windows). I think that should be exposed as a separate shutil.reflink() though, and copyfile() should just do a standard copy."

"Actually "man copy_file_range" claims it can do server-side copy, meaning no network traffic between client and server if *src* and *dst* live on the same network fs. So I agree copy_file_range() should be preferred over sendfile() after all. =)
I have a wrapper for copy_file_range() similar to what I did in shutil in issue33671 which I can easily integrate, but I wanted to land this one first:
Also, I suppose we cannot land this in time for 3.8?"


There was already a discussion about switching shutil to copy-on-write:

One problem is that modifying the "copied" file can suddenly become slower if it was copied using "cp --reflink".

It seems like adding a new reflink=False parameter to file copy functions to control clone/CoW copies is required to prevent bad surprises.
