New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose new copy_file_range() syscall in os module. #71013
Comments
copy_file_range() has been introduced in the Linux kernel since version 4.5 (mid march 2016). This new syscall allows to copy data from one fd to another without passing by user space, improving speed in most cases. You can read more about it here: https://lwn.net/Articles/659523/ I intend to start working on adding a binding for it in the os module and then, if it's available, use it in shutils.copy() to improve its efficiency. I have a couple of questions: If the syscall is not available, should I implement a user space alternative or should the method not exist at all? |
Thanks, looks interesting. We usually wait until syscalls are generally available in common distros and have bindings in glibc. It makes it easier to test the feature. |
Usually, yeah. os.urandom() uses syscall() to use the new getrandom() of Linux since it's still not exposed in the GNU libc... |
Tangentially related: bpo-25156, about using sendfile() to copy files in shutil. |
Debian Sid, arch Linux and Fedora FC24 (alpha) already have linux-4.5, others would certainly follow soon. Meanwhile, I could start developing the patch and we could review it when you think it's appropriate. |
Kerbel support ok, but what about the libc support? Do you know if it is |
Already there: |
That's the manual page of the Linux kernel, not the glibc. It doesn't |
Then I don't know. My only worry is whether having the method local to os would allow shutils.copy() to use it or not. I will start by looking at other similar functions there to see to do it. urandom() looks like a good starting point. |
If we add a new private function to the os module (ex: |
Ok, I have a preliminary version of the patch. It has several parts:
Several points:
Talking about tests, I tried copying a 325MiB on an SSD, f2fs. Here are the times: Old user space copy: $ time ./python -m timeit -n 10 -s 'import shutil' 'a = open ("a.mp4", "rb"); b = open ("b.mp4", "wb+"); shutil.copyfileobj (a, b, 16*1024*1024)'
10 loops, best of 3: 259 msec per loop
real 0m7.915s
user 0m0.104s
sys 0m7.792s New copy_file_range: $ time ./python -m timeit -n 10 -s 'import shutil' 'a = open ("a.mp4", "rb"); b = open ("b.mp4", "wb+"); shutil.copyfileobj (a, b, 16*1024*1024)'
10 loops, best of 3: 193 msec per loop
real 0m5.926s
user 0m0.080s
sys 0m5.836s Some 20% improvement, but notice that the buffer size is 1024 times Python's default size (16MiB vs. 16KiB). One difference that I notice in semantics is that if the file is not open in binary form, but the file is binary, you get no UnicodeDecodeError (because the data never reaches userspace). Let me know what you think. |
Version without the NEWS and ACKS change. |
Hmm, I just noticed that it doesn't fallback to normal copy if the arguments are not valid (EBADF, EXDEV). Back to the drawing board... |
New version. Factoring the old method in a nested function also paves the way to implement https://bugs.python.org/issue25156 . |
Yes, the RST documentation has to be done by hand. It usually has more detail than the doc strings. I didn’t see any changes to the configure script in your patches. Did you make that change to define HAVE_COPY_FILE_RANGE yet? In /Modules/posixmodule.c (all three of your patches have an indented diff header, so the review doesn’t pick it up): +#ifdef HAVE_COPY_FILE_RANGE What name are you referring to? The file posixmodule? I think the file name is a bit misleading; according to the comment at the top, it is also used on Windows. +return (!async_err) ? posix_error() : NULL; This would make more sense with the logic swapped around: async_err? NULL : posix_error() Regarding copyfileobj(), I think we should continue to support file objects that do not directly wrap FileIO (includes your Unicode transcoding case). See the points given in bpo-25063. Perhaps we could keep the new version as a high-level copy_file_range() wrapper, but retain brute_force_copy() under the original copyfileobj() name. A bit like the socket.sendfile() method vs os.sendfile(). |
I'm not really sure how to make the test for configure.ac. Other functions are checked differently (availability of header files), but in this case it would need a compile test. I will have to investigate further.
indented diff header?
Posix, The function is not Posix at all. I can remove that comment.
I have to be honest, I just copied it from posix_sendfile(), but I agree. I'll answer the last paragraph when I finished understanding it, but I think you mean things like zipFile. |
Updated the patch withe most of Martin Panter's and all vadium's comments. |
FYI Martin Panter and vadmium are both me :) I’m not a big fan or expert on configure.ac and Autoconf, but I guess in this case it is the simplest solution. Maybe look at some of the existing AC_CHECK_DECL and AC_COMPILE_IFELSE invocations. I guess you need to see if __NR_copy_file_range is available. In the earlier patches, there were four space characters at the start of the file. In the 2016-04-27 09:16 patch, there is a completely empty line (after +#endif /* HAVE_COPY_FILE_RANGE */), which may also be interfering. FWIW, I don’t think you have to have the posix_ prefix on your function if you don’t want it. It is a static function, so the naming is fairly unrestricted. About changing copyfileobj(), here are some test cases which may help explain the compatibility problems: # Transcoding a file to a different character encoding # Input is a BufferedReader and may hold extra buffered data
with open("data", "rb") as input, open("output", "wb") as output:
header = input.read(100) # Actually reads more bytes from OS
process_header(header)
copyfileobj(input, output) # Copy starting from offset 100 |
I see, I was expecting something like that after reading https://bugs.python.org/issue25063 's conclusions. I removed that part.
You mean having always a copy_file_range() method that in the worst case would fallback to copyfileobj()? I'm considering using it to optimize copyfile() (and indirectly copy() and copy2()) instead. And sorry for the patch wrong format. |
I'll do the copyfile() part once I'm convinced it doesn't break anything else. |
I managed to modify the congigure.ac file so it includes the proper test. after I run autoconf it generated the proper configure script, but I also needed to run autoheaders (both run by make autoconf). This last command modified a generated file that's in the repo, so I'm adding its diff too, but I'm not sure if that's ok. |
Yes, having a high-level version of copy_file_range() that falls back to copyfileobj() should be okay. I’m not sure if it should be a public API of shutil, or just an internal detail. I am wondering if it would be nice to rearrange the os.copy_file_range() signature and make more parameters optional, or is that getting too high level? copy_file_range(in, out, count, offset_in=None, offset_out=None, flags=0)
copy_file_range(f1, f2, size) # Try to copy a whole file
copy_file_range(f1, f2, 30, 100, 200) # Try 30 bytes at given offsets Also left some more review comments. Also, we should eventually add test case(s) for the new functionality, and an entry to Doc/whatsnew/3.6.rst. |
I'm not sure about this. For the moment c_f_o() is available only if the syscall is there.
I agree with this, most of the time you will want to just advance both offsets, and providing None all the time can be tiring. I fixed this, modified a little the doc, but now I'll read about integer types and sizes. |
I fixed most of the type problems, except that I'm not sure how to convert to size_t. Someone suggested to convert with 'n', then check if it's negative and correct. I'll ask the mailing list for better suggestions. I also noted that running 'hg diff' shows the modifications to 'configure', which I didn't include. This leads me to doubt whether including the modification to 'pyconfig.h' is ok, given that it's generated by 'autoheader'. |
For the generated files, it doesn’t matter much either way for review (I can just ignore them). As long as the eventual committer remembers to regenerate them. (Personally I’d prefer not to keep these files in the respository, but that’s a different can of worms :) |
Sorry for the delay. Based on suggestions in the mailing list, I changed the *count* handling as if it were a ssize_t, and I added a note about ignored output parameters. |
There’s still something funny about your patches: the last one has a bit of configure script at the end of the posixmodule.c diff. One other thing I thought of: “in” is not a practical keyword argument name in Python, because it is a reserved word. Yes, sendfile(**{"in": ...}) is already there, but I think we should find some other name for copy_file_range() before it is too late. Some ideas: copy_file_range(input, output, count, offset_in, offset_out, flags) # Spell them out
copy_file_range(fd_in, fd_out, len, off_in, off_out, flags) # Direct from man page
copy_file_range(src, dst, count, offset_src, offset_dst, flags) # Like os.replace(), shutil.copyfile(), etc
copy_file_range(fsrc, fdst, count, offset_src, offset_dst, flags) # Like shutil.copyfileobj() My favourites are probably “input”, or “src”. |
I settled for s/in/src/ and s/out/dst/, fixed typos, made sure the docs are in sync (parameters in the docstring were out of order), rephrased paragraph about flags parameter and included the configure.ac code for detecting availability of the syscall. I'm also thinking of leaving flags out, what do you think? |
Another option could be to use Serhiy’s proposed partial keywords support in bpo-26282. It is not yet committed, but it looks like it will go ahead, and then you could make “in” and “out” positional-only parameters, and the rest keywords. But IMO “src” and “dst” is fine. Also, dropping “flags” support seems reasonable. One benefit is that if Linux added an unexpected flag in the future that conflicts with Python’s assumptions, it could do weird stuff or crash Python. See bpo-24933 for example, where socket.recv(n, MSG_TRUNC) returns uninitialized data. The latest patch looks pretty good to me. Is there a consensus yet to add it as a public function? Before this goes in, it would be good to have a test case. |
I added a couple of unit tests, which lead me to fix a couple of bugs (yay!). I discarded the idea of removing any reference to flags. |
Fixed the last comments, including comparing what was written to the original data, but only to the length of what was actually written. I'm just not sure if the way to handle the syscall not existing is ok. |
It’s a bit ugly, but I would write the test so that it is recorded as skipped: try: |
ENOSYS catching fixed. |
|
It looked ok to me (I couldn't try it, as I still have 4.4 kernel). One thing to the be done is to improve the test coverage (trying the usage of all the parameters, at least). |
New version:
|
Another version:
|
Fixed extra space and semicolon (?!?!). Also, I'm getting 500 errors from riedvelt when I try to reply to comments. |
Victor, due to the mention of getrandom() early on, this came up when I was looking for issues that could potentially be closed based on the PEP-524 implementation landing. Any chance you or someone else could take another look during the pre-beta sprint this week? (Petr, FYI as a possible new feature that could be interesting from a Fedora system Python perspective) |
This is a great addition. I have a working patch adding sendfile() support for shutil.copyfileobj() which speeds it up by a factor of 1.3x on Linux. According to this https://lists.kernelnewbies.org/pipermail/kernelnewbies/2016-March/015999.html copy_file_range() may result in even better performances (but we may still want to use sendfile() for other UNIXes where file->file copy is supported - not sure which ones at this point). |
IMHO the next step would be to create a pull request on GitHub. |
I'm really sorry, but I don't have time to continue with this (new daughter!). Can someone else pick it up? |
Check for availability in configure.ac appears to be broken. |
Little update about this. According to: #if defined(__linux__) && \
defined(SYS_copy_file_range) && \
defined(__GLIBC_PREREQ) && \
__GLIBC_PREREQ(2, 27)
...
#endif I think (but not sure) this is supposed to fix this condition, which appears to be the major blocker here: |
shutil copy functions would definitively benefit of using copy_file_range() if available. Can someone please open a separated issue for shutil? |
shutil.copyfile() already uses sendfile() which basically provides the same performances. sendfile() should be preferred though because it’s supported since Linux 2.6.33. |
But copy_file_rane can leverage more filesystem features like deduplication and copy offload stuff. |
Giampaolo Rodola':
Pablo Galindo Salgado:
We can use copy_file_range() if available, or fallback to sendfile(). |
I think data deduplication / CoW / reflink copy is better implemented via FICLONE. "cp --reflink" uses it, I presume because it's older than copy_file_range(). I have a working patch adding CoW copy support for Linux and OSX (but not Windows). I think that should be exposed as a separate shutil.reflink() though, and copyfile() should just do a standard copy. |
Actually "man copy_file_range" claims it can do server-side copy, meaning no network traffic between client and server if *src* and *dst* live on the same network fs. So I agree copy_file_range() should be preferred over sendfile() after all. =) |
Please open a new issue to discuss how it can used in shutil ;-) |
I created bpo-37157: "shutil: add reflink=False to file copy functions to control clone/CoW copies (use copy_file_range)". -- The new os.copy_file_range() should be documented at: |
Use copy_file_range() in shutil.copyfile(): |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: