Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add os.sendfile() #55091

Closed
rosslagerwall mannequin opened this issue Jan 10, 2011 · 41 comments
Closed

Add os.sendfile() #55091

rosslagerwall mannequin opened this issue Jan 10, 2011 · 41 comments
Labels
extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@rosslagerwall
Copy link
Mannequin

rosslagerwall mannequin commented Jan 10, 2011

BPO 10882
Nosy @loewis, @birkenfeld, @pitrou, @giampaolo, @skrah, @anacrolix
Files
  • sendfile.patch: Patch + test for issue
  • sendfile_v2.patch: Updated to build on solaris
  • sendfile_v2.patch: adds entries in pyconfig.h.in and configure and works on Linux
  • sendfile_v3.patch: Updated for non-blocking sockets on certain platforms
  • sendfile_v4.patch: use keyword args
  • sendfile_v5.patch: updated documentation and tests
  • sendfile_v6.patch: provides full test suite; fixes BSD issue with non-blocking sockets
  • sendfile_v7.patch: skip headers/trailers test on linux and solaris; adds flags argument test
  • sendfile_v8.patch: return bytes sent only instead of (bsent, offset)
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-03-03.14:37:57.125>
    created_at = <Date 2011-01-10.20:27:45.652>
    labels = ['extension-modules', 'type-feature']
    title = 'Add os.sendfile()'
    updated_at = <Date 2011-09-02.02:14:55.132>
    user = 'https://bugs.python.org/rosslagerwall'

    bugs.python.org fields:

    activity = <Date 2011-09-02.02:14:55.132>
    actor = 'Yury.Selivanov'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-03-03.14:37:57.125>
    closer = 'giampaolo.rodola'
    components = ['Extension Modules']
    creation = <Date 2011-01-10.20:27:45.652>
    creator = 'rosslagerwall'
    dependencies = []
    files = ['20338', '20352', '20357', '20573', '20578', '20597', '20688', '20708', '20733']
    hgrepos = []
    issue_num = 10882
    keywords = ['patch']
    message_count = 41.0
    messages = ['125924', '125927', '125983', '125999', '126000', '126002', '126008', '126015', '126039', '126049', '126072', '126076', '126982', '126998', '127293', '127299', '127326', '127335', '127362', '127364', '127369', '127382', '127564', '128024', '128040', '128089', '128098', '128115', '128117', '128223', '128226', '128309', '129077', '129303', '129374', '129381', '129402', '129555', '129971', '136257', '136260']
    nosy_count = 9.0
    nosy_names = ['loewis', 'georg.brandl', 'pitrou', 'giampaolo.rodola', 'skrah', 'anacrolix', 'Yury.Selivanov', 'rosslagerwall', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue10882'
    versions = ['Python 3.3']

    @rosslagerwall
    Copy link
    Mannequin Author

    rosslagerwall mannequin commented Jan 10, 2011

    Attached is a patch which implements os.sendfile for unix systems (linux, freebsd, apple, solaris, dragonfly).

    It takes the iov initialization code and off_t parsing from i10812.

    It encapsulates all the functionality from the various sendfiles which means a fair amount of #ifdefs but the basic case works for all of them.

    Tested on Linux & FreeBSD - it should work on solaris but since it needs to link with the sendfile library and I have no idea how to link the posix module with the sendfile library only on Solaris, i couldn't test it. If someone could please contribute this...

    I think it might be possible to get a Windows equivalent of this - i'll leave it for someone else to do ;-)

    @rosslagerwall rosslagerwall mannequin added extension-modules C modules in the Modules dir type-feature A feature request or enhancement labels Jan 10, 2011
    @pitrou
    Copy link
    Member

    pitrou commented Jan 10, 2011

    Tested on Linux & FreeBSD - it should work on solaris but since it
    needs to link with the sendfile library and I have no idea how to link
    the posix module with the sendfile library only on Solaris, i couldn't
    test it.

    Since the posix module is linked statically inside the interpreter, it probably needs some change in the base ldflags. I'll take a look when I have some time.

    @rosslagerwall
    Copy link
    Mannequin Author

    rosslagerwall mannequin commented Jan 11, 2011

    Ok, I figured it out to link with sendfile on solaris. Here is the updated patch.

    @giampaolo
    Copy link
    Contributor

    Thanks for writing this.
    Follows my comments.

    I would focus on trying to provide a unique interface across all platforms. Being sendfile() not a standard POSIX I think we should not worry about providing a strict one-to-one interface.

    "headers" and "trailers" arguments should be available everywhere as they are crucial in different protocols such as HTTP.
    On Linux this is possible by using the TCP_CORK option.
    "man sendfile" on Linux provides some information. Also you might find useful to see how medusa did this (/medusa-20010416/sendfile/sendfilemodule.c):
    http://www.nightmare.com/medusa/

    The "offset" parameter should be available everywhere, Linux included (in your patch you dropped Linux support).
    Also, I think it should be optional since when it's NULL, sendfile() implicitly assumes the current offset (file's tell() return value).
    This is true on Linux, at least. Not sure about other platforms but my best guess is that it should not be mandatory.

    It turns out the only peculiar argument is "flags", available only on *BSD.
    I'm not sure what's best to do here. Maybe it makes sense to provide it across all platforms, defaulting to 0 and raise ValueError when specified on systems != *BSD.
    In summary, the function might look like this:

    sendfile(out, in, count, offset=None, headers=None, trailers=None, flags=0)

    @giampaolo
    Copy link
    Contributor

    Also, I think it should be optional since when it's NULL, sendfile()
    implicitly assumes the current offset (file's tell() return value).
    This is true on Linux, at least. Not sure about other platforms but my
    best guess is that it should not be mandatory.

    I'm not so sure about this anymore. Please ignore this.

    @rosslagerwall
    Copy link
    Mannequin Author

    rosslagerwall mannequin commented Jan 11, 2011

    Just to be clear:

    There are 3 different interfaces.
    The basic one with the offset included & no headers/trailers is supported by all the platforms, including Linux.
    The one with offset as None is only supported by Linux.
    The one with headers/trailers/flags is supported by FreeBSD & OS X.

    So it does provide a unique interface across all platforms while still providing the ability to access the native functionality.

    Preferably, I'd like to see a thin wrapper like this remain and then have a sendfile() method added to the socket object which takes a file-like object (not a file descriptor) and optional headers/trailers. This method can then figure out how best to do it depending on the platform. (i.e. using TCP_CORK if necessary, etc). It could even be made to work with file-like objects that cannot be mmap()ed.

    Why not put it straight in socket anyway? Well, some of the implementations allow sendfile() to have a normal fd as the output. Putting it in socket then would't make sense.

    @giampaolo
    Copy link
    Contributor

    I agree then, although I'm not sure having a function with a variable number of args depending on the platform is acceptable.
    I wanted to try your patch but it does not apply cleanly (python 3.2, revision 87930).

    @rosslagerwall
    Copy link
    Mannequin Author

    rosslagerwall mannequin commented Jan 11, 2011

    I've just tried it against r87935 and it applies cleanly.

    Perhaps you didn't apply the patch correctly (it requires "-p1" since it was a Mercurial diff), try:
    patch -p1 < sendfile_v2.patch

    With regards to the different arguments, I don't know if that's acceptable or not or if there is a better way. Since you can have mmap.mmap() with differing args between Windows & Unix, maybe it is acceptable. And, Python exposes differing functionality via the posix module since the available functions differs widely between platforms.

    @giampaolo
    Copy link
    Contributor

    The patch as-is did not work on Linux.
    I had to add entries in pyconfig.h.in and configure files in order to make os.sendfile available.
    Patch is in attachment.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jan 11, 2011

    I would focus on trying to provide a unique interface across all
    platforms. Being sendfile() not a standard POSIX I think we should
    not worry about providing a strict one-to-one interface.

    We absolutely need to expose sendfile as-is. If we want to provide
    some unifying layer, we must not call it sendfile.

    @rosslagerwall
    Copy link
    Mannequin Author

    rosslagerwall mannequin commented Jan 12, 2011

    Oh sorry, that was because it changed configure.in so "autoreconf" needs to be run to regenerate configure & pyconfig.h.in.

    I thought that patches weren't meant to include the regenerated files. Especially since differences in the versions between autoconf 2.65 & 2.67 can produce hundreds of differences in "configure" making the patch large and obfuscating the actual changes.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jan 12, 2011

    I thought that patches weren't meant to include the regenerated files.

    Correct. Not including them is perfectly fine.

    @anacrolix
    Copy link
    Mannequin

    anacrolix mannequin commented Jan 25, 2011

    I notice Linux is described as not taking count=0 to mean to send until the end of "in" is reached. Is it possible to pass (size_t)-1 to this field if None is given, or do a loop on non-zero counts from sendfile to emulate this?

    I poked around the linux source for 2.6.32, and it appears sendfile() is emulated on top of splice(), but this doesn't provide undocumented count=0 handling as I was hoping.

    @giampaolo
    Copy link
    Contributor

    Please note that on FreeBSD things work a little bit differently for non-blocking sockets:
    http://www.freebsd.org/cgi/man.cgi?query=sendfile&sektion=2

    In details I'm talking about:

    When using a socket marked for non-blocking I/O, sendfile() may send
    fewer bytes than requested. In this case, the number of bytes
    success-fully written is returned in *sbytes (if specified), and the
    error EAGAIN is returned.

    ...and the similar note about EBUSY, later in the page.
    Something like this should work:

    ret = sendfile(in, out, offset, &sbytes, &sf, flags);

    ...

    if (ret == -1) {
        if ((errno == EAGAIN) || (errno == EBUSY)) {
            return Py_BuildValue("ll", sbytes, offset + sbytes);
        } 
        return posix_error();
    }

    @rosslagerwall
    Copy link
    Mannequin Author

    rosslagerwall mannequin commented Jan 28, 2011

    Attached is a new sendfile patch which fixes the issue with FreeBSD (and Mac OS X & DragonFly BSD from what I can see).

    With regards to anacrolix's request, I think what Martin said in msg126049. i.e. if we want to provide a unifying layer on top of sendfile we can, but this should just expose sendfile() as is.

    @giampaolo
    Copy link
    Contributor

    Could you please also add support for offset argument on Linux?
    Also, "headers", "trailers" and "flags" could be turned in keyword args for simplicity.

    @rosslagerwall
    Copy link
    Mannequin Author

    rosslagerwall mannequin commented Jan 28, 2011

    Attached is an updated patch that uses keyword arguments.

    Using an offset with Linux was always supported although I have cleaned up the documentation a bit to make that clearer.
    E.g. the following script sends part of a file over a socket (shows using an offset and None).
    This requires listening with a socket on the same computer on port 8001 (e.g. nc -l 8001).

    import os
    import socket
    
    with open("/tmp/test", "wb") as fp:
        fp.write(b"testdata\n" * 1000000)
    
    cli = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    cli.connect(('localhost', 8010))
    c = cli.fileno()
    f = os.open("/tmp/test", os.O_RDONLY)
    
    os.sendfile(c, f, 1, 7) # "estdata"
    cli.send(b"\n\n")
    os.sendfile(c, f, None, 4) # "test"
    cli.send(b"\n\n")
    os.sendfile(c, f, None, 4) # "data"
    cli.send(b"\n\n")

    @giampaolo
    Copy link
    Contributor

    Copy *count* bytes from file descriptor *in* to file descriptor
    *out*, starting at *offset* and continuing for *count* bytes.

    The latter part is incorrect as it is not guaranteed that all bytes specified in "count" argument are going to be sent and it also sounds like the function is blocking.
    I'd change that in just "Copy *count* bytes from file descriptor *in* to file descriptor *out*."

    Also, I'd be for using the BSD notation and rename the "count" argument to "nbytes", which is more clear.

    Docstring should be changed to reflect the keyword arguments:

    • sendfile(out, in, offset, count, headers, trailers, flags)\n\
      + sendfile(out, in, offset, count[, headers[, trailers], flags]]])\n\

    Finally, tests for header and trailer arguments should be written.

    The rest of the patch looks ok to me.

    @anacrolix
    Copy link
    Mannequin

    anacrolix mannequin commented Jan 29, 2011

    I have a few problems with these parts of the latest patch:

    + The second case may be used on Mac OS X and FreeBSD where *headers*
    + and *trailers* are arbitrary sequences of buffers that are written before and
    + after the data from *in* is written. It returns the same as the first case.

    Why special case these? Why can't Mac OS X and FreeBSD write those manually into the output file descriptor. It's presumptious but I don't see why something so easy to do explicitly is mashed into the interface here, just pretend it doesn't exist. For the sake of simplicity (and sendfile might become very popular in future code), just drop this "feature".

    for h in headers: out.write(h)
    os.sendfile(out, in, offset, count)
    for t in trailers: out.write(t)

    + On Mac OS X and FreeBSD, a value of 0 for *count* specifies to send until
    + the end of *in* is reached.

    Again this should be emulated where it's not available, as it's very common, and a logical feature to have. However as indicated earlier, if os.sendfile is a minimal syscall wrapper, some thought needs to be given to a "higher-level" function, that also includes my other suggestions.

    + On Solaris, *out* may be the file descriptor of a regular file or the file
    + descriptor of a socket. On all other platforms, *out* must be the file
    + descriptor of an open socket

    I'm pretty sure that Solaris isn't the only platform that supports non-socket file descriptors here, Linux (the platform I'm using), is one such case. As a general rule, we want to allow any file descriptors, and not restrict to sockets (except for awful platforms like Windows).

    @giampaolo
    Copy link
    Contributor

    I'm pretty sure that Solaris isn't the only platform that supports non-
    socket file descriptors here, Linux (the platform I'm using), is one
    such case.

    No, sendfile() on Linux supports only socket file descriptors:
    http://linux.die.net/man/2/sendfile

    @anacrolix
    Copy link
    Mannequin

    anacrolix mannequin commented Jan 29, 2011

    Thanks for catching that:

    Presently (Linux 2.6.9): in_fd, must correspond to a file which sup‐
    ports mmap(2)-like operations (i.e., it cannot be a socket); and out_fd
    must refer to a socket.

    Despite the fact the manpage hasn't changed since 2004, sendfile still only works for sockets. :(

    @rosslagerwall
    Copy link
    Mannequin Author

    rosslagerwall mannequin commented Jan 29, 2011

    OK, updated documentation and tests.

    Why special case these? Why can't Mac OS X and FreeBSD write those manually into the output file descriptor.

    These can be a crucial part of certain protocols such as HTTP to ensure that a minimal amount of TCP packets are sent. Also, the posix module is supposed to expose the OS functionality transparently. Besides, if you don't want to use headers/trailers, they can simply be left out.

    @giampaolo
    Copy link
    Contributor

    In case someone is interested in statistics, I wrote a sendfile() wrapper by using ctypes for pyftpdlib and benchmark results are quite impressive:
    http://code.google.com/p/pyftpdlib/issues/detail?id=152#c5

    @giampaolo
    Copy link
    Contributor

    Patch in attachment provides a complete test suite.
    It also fixes a problem which occurred on BSD platforms when using non-blocking sockets: EAGAIN/EBUSY are now raised if the transmitted data == 0 bytes reflecting socket's send() behavior:

    + if (ret < 0) {
    + if ((errno == EAGAIN) || (errno == EBUSY)) {
    + if (sbytes != 0) {
    + // some data has been sent
    + goto done;
    + }
    + else {
    + // no data has been sent; upper application is supposed
    + // to retry on EAGAIN or EBUSY
    + return posix_error();
    + }
    + }
    + return posix_error();
    + }
    + goto done;

    The test suite shows that "trailer" argument does not work.
    I couldn't manage to figure out what's wrong though.

    @rosslagerwall
    Copy link
    Mannequin Author

    rosslagerwall mannequin commented Feb 6, 2011

    For trailers to work, I think the line:
    self.assertEqual(data, "abcde12345")
    should be:
    self.assertEqual(data, b"abcde12345")

    Also not that tests like this:
    if not sys.platform.startswith('linux'):

    perhaps should also include solaris since it doesn't support headers/trailers either.

    @giampaolo
    Copy link
    Contributor

    You're right, that line is certainly wrong as it should compare for bytes but the trailer data still doesn't get appended.
    Do you have a BSD box in order to figure out what's wrong?

    @rosslagerwall
    Copy link
    Mannequin Author

    rosslagerwall mannequin commented Feb 7, 2011

    With no changes, I get:
    ======================================================================
    FAIL: test_trailers (test.test_os.TestSendfile)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/usr/home/ross/py3k_sftest/Lib/test/test_os.py", line 1531, in test_trailers
        self.assertEqual(data, "abcde12345")
    AssertionError: b'abcde12345' != 'abcde12345'

    After changing, the tests work perfectly. Perhaps its the FreeBSD version? Here's the output from uname:
    FreeBSD fbds 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 UTC 2010 root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64

    Maybe its a bug on that platform, you could try building a sendfile program in C with trailer data. Or try a simple test in the interpreter like the one in msg127326 but with added trailer data.

    @giampaolo
    Copy link
    Contributor

    I was testing against FreeBSD 7.0 RC1 and I confirm the problem doesn't occur on 8.1 version. Patch in attachment adds test for "flags" argument and skips headers/trailers tests on linux and solaris.
    I'm ok for committing this as-is in 3.3.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 7, 2011

    Please leave open until a patch is actually committed ;)

    @giampaolo
    Copy link
    Contributor

    On a second thought I have two complaints.

    There is no reason to return the file offset, also because this is only supported on Linux. On all other platforms we are calculating the file offset by making a sum of offset + number of bytes sent. Despite this would normally work it no longer makes sense when headers/trailers arguments are specified.

    >>> sendfile(in, out, offset, 4096, headers=[b"xxxx"])
    (5000, 5000)

    In this case the right offset is supposed to be 4096.
    We might adjust this in the code but in my opinion knowing the file offset is not really necessary as what the user really needs is the number of bytes sent, as when using socket.send().

    My second complaint is about headers/trailers data type.
    Currently we are passing a sequence of buffers (tipically a list) but I think we should pass plain bytes instead.

    @rosslagerwall
    Copy link
    Mannequin Author

    rosslagerwall mannequin commented Feb 9, 2011

    OK, I'm happy to not return the file offset. However, I still think that headers and trailers should remain as is since this matches the native interface very closely.

    @giampaolo
    Copy link
    Contributor

    Patch in attachment removes offset and return just the number of bytes sent as an integer.
    Tests and documentation are updated accordingly.

    @rosslagerwall
    Copy link
    Mannequin Author

    rosslagerwall mannequin commented Feb 22, 2011

    Yes I agree it can go in now. Unless someone wants to do some tests on more OS's like FreeBSD 7.2, Solaris, etc. (I've only checked on Linux 2.6, FreeBSD 8.1, OpenIndiana and OS X 10.5).

    @giampaolo
    Copy link
    Contributor

    I'm going to commit the patch and then watch whether some of the buildbots turn red.

    @giampaolo
    Copy link
    Contributor

    Committed in r88580.

    @birkenfeld
    Copy link
    Member

    Three comments:

    • When changing configure.in, you also should regenerate configure. (Now done in r88584).

    • PyParse_off_t is a bad name for this function. It is not a new C API, so it should be static, and therefore there is no need for the Py prefix.
      Changed this to _parse_off_t in r88585.

    • The whitespace cleanup should have been committed separately.

    @giampaolo
    Copy link
    Contributor

    Thanks Georg.

    It seems we have a failure on Leopard:
    http://www.python.org/dev/buildbot/all/builders/PPC%20Leopard%203.x/builds/1411/steps/test/logs/stdio

    Also, I think I can add support for AIX if someone gives me SSH access over an AIX box.
    I'm maintaining a separate project to port sendfile() on python 2:
    http://code.google.com/p/py-sendfile/

    @pitrou
    Copy link
    Member

    pitrou commented Feb 26, 2011

    It seems we have a failure on Leopard:

    I think the issue is you request a 4096 transfer near the end of file, but there aren't that many bytes remaining. Apparently OS X then doesn't transfer anything and returns 0. Hint:

    >>> 10485760. / 4096
    2560.0
    >>> 10486272. / 4096
    2560.125

    @giampaolo
    Copy link
    Contributor

    OSX failure is tracked in bpo-11351.
    Closing this out as fixed.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented May 18, 2011

    test_os fails on the Fedora bot (--without-threads):

    test test_os crashed -- Traceback (most recent call last):
      File "./Lib/test/regrtest.py", line 1037, in runtest_inner
      File "/home/buildbot/buildarea/3.x.krah-fedora/build/Lib/importlib/_bootstrap.py", line 437, in load_module
        return self._load_module(fullname)
      File "/home/buildbot/buildarea/3.x.krah-fedora/build/Lib/importlib/_bootstrap.py", line 141, in decorated
        return fxn(self, module, *args, **kwargs)
      File "/home/buildbot/buildarea/3.x.krah-fedora/build/Lib/importlib/_bootstrap.py", line 342, in _load_module
        exec(code_object, module.__dict__)
      File "/home/buildbot/buildarea/3.x.krah-fedora/build/Lib/test/test_os.py", line 1312, in <module>
        class SendfileTestServer(asyncore.dispatcher, threading.Thread):
    AttributeError: 'NoneType' object has no attribute 'Thread'

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 18, 2011

    New changeset c45e92bd4d81 by Giampaolo Rodola' in branch 'default':
    os.sendfile() test: fix "AttributeError: 'NoneType' object has no attribute 'Thread'" when running tests with --without-threads option.
    http://hg.python.org/cpython/rev/c45e92bd4d81

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants