Issue37157
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2019-06-04 21:36 by vstinner, last changed 2022-04-11 14:59 by admin.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
cow.diff | giampaolo.rodola, 2019-06-05 05:02 | |||
cow2.diff | giampaolo.rodola, 2019-06-05 10:45 |
Messages (14) | |||
---|---|---|---|
msg344648 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-06-04 21:36 | |
bpo-26826 added a new os.copy_file_range() function: https://docs.python.org/dev/library/os.html#os.copy_file_range As os.sendfile(), this new Linux syscall avoids memory copies between kernel space and user space. It matters for performance, especially since Meltdown vulnerability required Windows, Linux, FreeBSD, etc. to use a different address space for the kernel (like Linux Kernel page-table isolation, KPTI). shutil has been modified in Python 3.8 to use os.sendfile() on Linux: https://docs.python.org/dev/whatsnew/3.8.html#optimizations But according to Pablo Galindo Salgado, copy_file_range() goes further: "But copy_file_rane can leverage more filesystem features like deduplication and copy offload stuff." https://bugs.python.org/issue26826#msg344582 Giampaolo Rodola' added: "I think data deduplication / CoW / reflink copy is better implemented via FICLONE. "cp --reflink" uses it, I presume because it's older than copy_file_range(). I have a working patch adding CoW copy support for Linux and OSX (but not Windows). I think that should be exposed as a separate shutil.reflink() though, and copyfile() should just do a standard copy." "Actually "man copy_file_range" claims it can do server-side copy, meaning no network traffic between client and server if *src* and *dst* live on the same network fs. So I agree copy_file_range() should be preferred over sendfile() after all. =) I have a wrapper for copy_file_range() similar to what I did in shutil in issue33671 which I can easily integrate, but I wanted to land this one first: https://bugs.python.org/issue37096 Also, I suppose we cannot land this in time for 3.8?" https://bugs.python.org/issue26826#msg344586 -- There was already a discussion about switching shutil to copy-on-write: https://bugs.python.org/issue33671#msg317989 One problem is that modifying the "copied" file can suddenly become slower if it was copied using "cp --reflink". It seems like adding a new reflink=False parameter to file copy functions to control clone/CoW copies is required to prevent bad surprises. |
|||
msg344651 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-06-04 21:51 | |
Random notes. Extract of Linux manual page of "cp": --reflink[=WHEN] control clone/CoW copies. See below When --reflink[=always] is specified, perform a lightweight copy, where the data blocks are copied only when modified. If this is not possible the copy fails, or if --reflink=auto is specified, fall back to a stan‐ dard copy. Use --reflink=never to ensure a standard copy is performed. -- "Why is cp --reflink=auto not the default behaviour?": https://unix.stackexchange.com/questions/80351/why-is-cp-reflink-auto-not-the-default-behaviour -- reflinks are supported by BTRFS and OCFS2. XFS seems to have an experimental support for reflink, 2 years old article: https://strugglers.net/~andy/blog/2017/01/10/xfs-reflinks-and-deduplication/ Linux version of ZFS doesn't support reflink yet: https://github.com/zfsonlinux/zfs/issues/405 -- Python binding using cffi to get reflink: https://gitlab.com/rubdos/pyreflink "Btrfs, XFS, OCFS2 reflink support. Btrfs is tested the most. Apple macOS APFS clonefile support. Little testing, be careful. It might eat data." -- "reflink for Windows": https://github.com/0xbadfca11/reflink "Windows Server 2016 introduce Block Cloning feature." => https://docs.microsoft.com/en-us/windows-server/storage/refs/block-cloning "ReFS v2 is only available in Windows Server 2016 and Windows 10 version 1703 (build 15063) or later. Windows 10 version 1607 (build 14393) and earlier Windows only can use ReFS v1." -- Linux has 2 ioctl: #include <sys/ioctl.h> #include <linux/fs.h> int ioctl(int dest_fd, FICLONERANGE, struct file_clone_range *arg); int ioctl(int dest_fd, FICLONE, int src_fd); http://man7.org/linux/man-pages/man2/ioctl_ficlonerange.2.html |
|||
msg344667 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * | Date: 2019-06-05 05:02 | |
I'm attaching an initial PoC using FICLONE on Linux and clonefile(3) on OSX. It is also possible to support Windows but it requires a ReFS partition to test against which I currently don't have. I opted for exposing reflink() as a separate function, mostly because: - conceptually standard copy and CoW copy are 2 different things - shutil already provides a distinction between copy functions (copy(), copy2(), copyfile()) which can be used as callbacks for copytree() and move(). As such one can follow the same approach and do: >>> copytree(src, dst, copy_function=reflink). This initial patch provides a callback=None parameter in case the CoW operation fails because not supported by the underlying filesystems but this is debatable because we can get different errors depending on the platform (which is not good). As such a more generic ReflinkNotSupportedError exception is probably a better choice. |
|||
msg344692 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-06-05 10:00 | |
cow.diff: I'm not sure that attempt to call unlink() if FICLONE fails is a good idea. unlink() can raise a new exception which can be confusing. IMHO it's up to the caller to deal with that. Said differently, I dislike the *fallback* parameter of reflink(). Why not exposing clonefile() as os.clonefile() but os._clonefile()? +#if defined(MAC_OS_X_VERSION_10_12) +#include <sys/clonefile.h> +#define HAVE_CLONEFILE +#endif Is Python compiled to target macOS 10.12 and newer? Mac/BuildScript/build-installer.py contains: # $MACOSX_DEPLOYMENT_TARGET -> minimum OS X level DEPTARGET = '10.5' But I don't know well macOS. "#if defined(MAC_OS_X_VERSION_10_12)" is a check at build time. Does it depend on DEPTARGET? Would it be possible to use a runtime check? You might open a dedicated issue to expose clonefile() since it seems like every tiny detail of this issue is very subtle and should be properly discussed ;-) (I like the idea of exposing native functions like clonefile() directly in the os module!) |
|||
msg344694 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-06-05 10:10 | |
> This initial patch provides a callback=None parameter in case the CoW operation fails because not supported by the underlying filesystems but this is debatable because we can get different errors depending on the platform (which is not good). As such a more generic ReflinkNotSupportedError exception is probably a better choice. (Oh, my laptop only uses btrfs. Hum, I created a loop device to test an ext4 partition :-)) On an ext4 partition, cp --reflink simply fails with an error: it doesn't fallback on a regular copy. vstinner@apu$ dd if=/dev/urandom of=urandom bs=1k count=1k 1024+0 records in 1024+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0123142 s, 85.2 MB/s vstinner@apu$ cp --reflink urandom urandom2 'urandom' -> 'urandom2' cp: failed to clone 'urandom2' from 'urandom': Operation not supported vstinner@apu$ file urandom2 urandom2: empty vstinner@apu$ stat urandom2 File: urandom2 Size: 0 Blocks: 2 IO Block: 1024 regular empty file Device: 700h/1792d Inode: 13 Links: 1 Access: (0664/-rw-rw-r--) Uid: ( 1000/vstinner) Gid: ( 1000/vstinner) Context: unconfined_u:object_r:unlabeled_t:s0 Access: 2019-06-05 12:08:23.000000000 +0200 Modify: 2019-06-05 12:08:23.000000000 +0200 Change: 2019-06-05 12:08:23.000000000 +0200 Birth: - Not only it fails, but it leaves an empty file. I suggest to mimick the Linux cp command: don't automatically fallback (there are too many error conditions, too many risks of raising a new error while handling the previous error) and don't try to remove the created empty file if reflink() fails. |
|||
msg344697 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * | Date: 2019-06-05 10:22 | |
> I'm not sure that attempt to call unlink() if FICLONE fails is a good idea Agreed. > I dislike the *fallback* parameter of reflink(). Me too. A specific exception is better. > Why not exposing clonefile() as os.clonefile() but os._clonefile()? Mmm... I'm not sure it's worth it. The only reason one may want to use clonefile() directly is for passing CLONE_NOFOLLOW and CLONE_NOOWNERCOPY flags (the only possible ones): - CLONE_NOFOLLOW can be exposed via "follow_symlinks=True" (like other shutil.* functions) and used internally - CLONE_NOOWNERCOPY should also be passed internally by default because all other functions of shutil do not copy ownership (there's a warning at the top of the doc), so I think it makes sense for reflink() to do the same. > +#if defined(MAC_OS_X_VERSION_10_12): Would it be possible to use a runtime check? Good point. It should definitively be loaded at runtime. I will look into that (but not soon). |
|||
msg344702 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * | Date: 2019-06-05 10:45 | |
Adding a new patch (still a PoC, will create a PR when I have something more solid). |
|||
msg344709 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-06-05 11:58 | |
I'm curious: is it possible to query the filesystem to check if a copied is copied using CoW? I guess that it's possible, it will be non portable. So I guess that it's better to avoid checking that in unit tests. vstinner@apu$ dd if=/dev/urandom of=urandom bs=1k count=1k 1024+0 records in 1024+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0309671 s, 33.9 MB/s vstinner@apu$ cp --reflink urandom urandom2 'urandom' -> 'urandom2' vstinner@apu$ stat urandom File: urandom Size: 1048576 Blocks: 2048 IO Block: 4096 regular file Device: 31h/49d Inode: 16265363 Links: 1 Access: (0664/-rw-rw-r--) Uid: ( 1000/vstinner) Gid: ( 1000/vstinner) Context: unconfined_u:object_r:user_home_t:s0 Access: 2019-06-05 13:56:21.381196972 +0200 Modify: 2019-06-05 13:56:21.412197007 +0200 Change: 2019-06-05 13:56:21.412197007 +0200 Birth: 2019-06-05 13:56:21.381196972 +0200 vstinner@apu$ stat urandom2 File: urandom2 Size: 1048576 Blocks: 2048 IO Block: 4096 regular file Device: 31h/49d Inode: 16265364 Links: 1 Access: (0664/-rw-rw-r--) Uid: ( 1000/vstinner) Gid: ( 1000/vstinner) Context: unconfined_u:object_r:user_home_t:s0 Access: 2019-06-05 13:56:24.487200453 +0200 Modify: 2019-06-05 13:56:24.496200463 +0200 Change: 2019-06-05 13:56:24.496200463 +0200 Birth: 2019-06-05 13:56:24.487200453 +0200 Using stat command line tool, I don't see anything obvious saying that the two files share the same data on disk. |
|||
msg350623 - (view) | Author: Kubilay Kocak (koobs) | Date: 2019-08-27 09:40 | |
See Also: #26826 |
|||
msg384015 - (view) | Author: Albert Zeyer (Albert.Zeyer) * | Date: 2020-12-29 17:05 | |
Is FICLONE really needed? Doesn't copy_file_range already supports the same? I posted the same question here: https://stackoverflow.com/questions/65492932/ficlone-vs-ficlonerange-vs-copy-file-range-for-copy-on-write-support |
|||
msg384105 - (view) | Author: Albert Zeyer (Albert.Zeyer) * | Date: 2020-12-31 09:20 | |
I did some further research (with all details here: https://stackoverflow.com/a/65518879/133374). See vfs_copy_file_range in the Linux kernel. This first tries to call remap_file_range if possible. FICLONE calls ioctl_file_clone. ioctl_file_clone calls vfs_clone_file_range. vfs_clone_file_range calls remap_file_range. I.e. FICLONE == remap_file_range. So using copy_file_range (if available) should be the most generic solution, which includes copy-on-write support, and server-side copy support. |
|||
msg403414 - (view) | Author: Dulanic (dulanic) | Date: 2021-10-07 14:24 | |
As a note, coreutils 9.0 cp defaults now to reflink=auto. https://www.phoronix.com/scan.php?page=news_item&px=GNU-Coreutils-9.0 |
|||
msg403418 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * | Date: 2021-10-07 14:37 | |
> So using copy_file_range (if available) should be the most generic solution, which includes copy-on-write support, and server-side copy support. Doesn't this imply to pass some flag to copy_file_range()? "man copy_file_range" says: > The flags argument is provided to allow for future extensions and currently must be set to 0. How is CoW copy supposed to be done by using copy_file_range() exactly? |
|||
msg403423 - (view) | Author: Albert Zeyer (Albert.Zeyer) * | Date: 2021-10-07 15:07 | |
> How is CoW copy supposed to be done by using copy_file_range() exactly? I think copy_file_range() will just always use copy-on-write and/or server-side-copy when available. You cannot even turn that off. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:16 | admin | set | github: 81338 |
2021-10-07 19:23:42 | vstinner | set | nosy:
- vstinner |
2021-10-07 15:07:02 | Albert.Zeyer | set | messages: + msg403423 |
2021-10-07 14:37:05 | giampaolo.rodola | set | messages: + msg403418 |
2021-10-07 14:24:25 | dulanic | set | nosy:
+ dulanic messages: + msg403414 |
2020-12-31 09:20:13 | Albert.Zeyer | set | messages: + msg384105 |
2020-12-29 17:05:38 | Albert.Zeyer | set | nosy:
+ Albert.Zeyer messages: + msg384015 |
2020-04-22 20:02:40 | desbma | set | nosy:
+ desbma |
2019-08-27 09:40:19 | koobs | set | nosy:
+ koobs messages: + msg350623 |
2019-06-05 11:58:37 | vstinner | set | messages: + msg344709 |
2019-06-05 10:45:13 | giampaolo.rodola | set | files:
+ cow2.diff messages: + msg344702 |
2019-06-05 10:22:17 | giampaolo.rodola | set | messages: + msg344697 |
2019-06-05 10:10:06 | vstinner | set | messages: + msg344694 |
2019-06-05 10:00:25 | vstinner | set | messages: + msg344692 |
2019-06-05 05:02:24 | giampaolo.rodola | set | files:
+ cow.diff keywords: + patch messages: + msg344667 |
2019-06-04 21:51:31 | vstinner | set | messages: + msg344651 |
2019-06-04 21:36:40 | vstinner | create |