classification
Title: Add splice() to the os module
Type: Stage: patch review
Components: Library (Lib) Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: pablogsal Nosy List: corona10, pablogsal, serhiy.storchaka, shihai1991, vstinner
Priority: normal Keywords: patch

Created on 2020-08-24 15:57 by pablogsal, last changed 2020-10-13 21:59 by pablogsal.

Pull Requests
URL Status Linked Edit
PR 21947 open pablogsal, 2020-08-24 16:01
Messages (10)
msg375851 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-08-24 15:57
The splice system call moves data between two file descriptors without copying between kernel address space and user address space.  This can be a very useful addition for libraries implementing low-level file management.
msg375852 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-08-24 16:02
I don't recall the subtle differences between sendfile() and splice(). I recall that in early Linux versions, one was limited to sockets, and only on one side. But later, it became possible to pass two sockets, or one file on disk and one socket, etc.

Python exposes sendfile() as os.sendfile() since Python 3.3:
https://docs.python.org/dev/library/os.html#os.sendfile
msg375857 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-08-24 17:30
> I don't recall the subtle differences between sendfile() and splice().

Basically, splice() is specialized for pipes:


splice() only works if one of the file descriptors refer to a pipe. So you can use for e.g. socket-to-pipe or pipe-to-file without copying the data into userspace. But you can't do file-to-file copies with it.

sendfile() only works if the source file descriptor refers to something that can be mmap()ed (i.e. mostly normal files) and before 2.6.33 the destination must be a socket.
msg375872 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-08-25 07:17
The API of splice() looks complicated. How would you use it in Python?

Are off_in and off_out adjusted as in copy_file_range() and sendfile()? It is not clear from the man page. If they are, how would you return updated values?

Are you going to add vmsplice() and tee() too? Since it is Linux-specific API, would not be better to add a purposed module linux?
msg375873 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-08-25 09:21
> Are you going to add vmsplice() and tee() too? Since it is Linux-specific API, would not be better to add a purposed module linux?

It's not uncommon that a syscall added to the Linux kernel is later added to other platforms.

Example: getrandom() exists in Linux and Solaris.

Example: memfd_create() was designed in Linux, and added later to FreeBSD: https://github.com/freebsd/freebsd/commit/575e351fdd996f72921b87e71c2c26466e887ed2 (see bpo-41013).
msg375875 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-08-25 09:25
OpenBSD uses a different API:
https://man.openbsd.org/sosplice.9

int sosplice(struct socket *so, int fd, off_t max, struct timeval *tv);
int somove(struct socket *so, int wait);

"The function sosplice() is used to splice together a source and a drain socket."

"The function somove() transfers data from the source's receive buffer to the drain's send buffer."

"Socket splicing can be invoked from userland via the setsockopt(2) system-call at the SOL_SOCKET level with the socket option SO_SPLICE."
msg375876 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-08-25 09:47
> The API of splice() looks complicated. How would you use it in Python?

It has the same API as copy_file_range and other similar system calls that we already expose, so we just need to do the same thing we do there.

> Are off_in and off_out adjusted as in copy_file_range() and sendfile()? It is not clear from the man page. If they are, how would you return updated values?

It behaves the same as in copy_file_range() with the exception that one has to be None (the one associated with the pipe file descriptor). We don't return the updated values (neither we do in copy_file_range()).

> Are you going to add vmsplice() and tee() too? Since it is Linux-specific API, would not be better to add a purposed module linux?

We can certainly discuss adding vmsplice() and tee() (probably tee is more interesting), but in my humble oppinion that would be a different discussion.
msg375877 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-08-25 09:49
> OpenBSD uses a different API:

The semantics are considerably different (splice() is about pipes while sosplice() talks about general sockets). Also, the point of splice() is to skip copying from kernel buffers, but sosplice() does not mention that it does not copy between userspace and kernel space
msg375878 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-08-25 09:51
> Since it is Linux-specific API, would not be better to add a purposed module linux?

This is an interesting point, but I think that at this particular point it would be more confusing for users than not (normally people go to the os module for system calls) and as Victor mention, we would need to update the os module if some other operative system adds the system call later
msg378581 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-10-13 21:59
Heads up: I plant to land this next week in case someone could to do a review or has something against
History
Date User Action Args
2020-10-13 21:59:23pablogsalsetmessages: + msg378581
2020-08-25 16:24:49shihai1991setnosy: + shihai1991
2020-08-25 09:51:34pablogsalsetmessages: + msg375878
2020-08-25 09:49:50pablogsalsetmessages: + msg375877
2020-08-25 09:47:04pablogsalsetmessages: + msg375876
2020-08-25 09:25:17vstinnersetmessages: + msg375875
2020-08-25 09:21:45vstinnersetmessages: + msg375873
2020-08-25 07:17:35serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg375872
2020-08-25 06:03:29corona10setnosy: + corona10
2020-08-24 17:30:18pablogsalsetmessages: + msg375857
2020-08-24 16:02:11vstinnersetnosy: + vstinner
messages: + msg375852
2020-08-24 16:01:14pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request21057
2020-08-24 15:57:28pablogsalsetassignee: pablogsal
components: + Library (Lib)
versions: + Python 3.10
2020-08-24 15:57:18pablogsalcreate