Add a new socket.sendfile() method #61752

giampaolo · 2013-03-26T19:27:53Z

BPO	17552
Nosy	@gvanrossum, @pitrou, @giampaolo, @tiran, @asvetlov, @4kir4, @1st1, @MojoVampire
Files	socket-sendfile1.patch: initial draft socket-sendfile2.patch: update file offset add 'offset' parameter socket-sendfile3.patch: docs, selectors module, always seek() file, use memoryview(), ssl tests bench.py socket-sendfile4.patch: return no. of bytes sent instead of (ok, exc); "sleep" on BlockingIOError socket-sendfile5.patch: fix some issues on windows socket-sendfile6.patch: get rid of "blocksize" and "use_fallback" args, add "count" arg, use fstat() socket-sendfile7.patch: adds send_blocksize argument

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/giampaolo'
closed_at = <Date 2014-06-11.02:04:53.134>
created_at = <Date 2013-03-26.19:27:53.402>
labels = ['type-feature', 'library']
title = 'Add a new socket.sendfile() method'
updated_at = <Date 2014-06-11.02:04:53.054>
user = 'https://github.com/giampaolo'

bugs.python.org fields:

activity = <Date 2014-06-11.02:04:53.054>
actor = 'giampaolo.rodola'
assignee = 'giampaolo.rodola'
closed = True
closed_date = <Date 2014-06-11.02:04:53.134>
closer = 'giampaolo.rodola'
components = ['Library (Lib)']
creation = <Date 2013-03-26.19:27:53.402>
creator = 'giampaolo.rodola'
dependencies = []
files = ['29583', '29734', '34986', '34994', '34997', '35004', '35039', '35507']
hgrepos = []
issue_num = 17552
keywords = ['3.2regression']
message_count = 35.0
messages = ['185292', '185294', '185295', '185303', '186305', '216915', '216968', '216973', '216975', '216979', '217015', '217058', '217080', '217081', '217082', '217090', '217099', '217104', '217120', '217121', '217128', '217143', '217144', '217156', '217165', '219926', '219927', '219928', '219929', '220173', '220191', '220195', '220202', '220220', '220226']
nosy_count = 12.0
nosy_names = ['gvanrossum', 'pitrou', 'giampaolo.rodola', 'christian.heimes', 'josiah.carlson', 'asvetlov', 'neologix', 'akira', 'rosslagerwall', 'python-dev', 'yselivanov', 'josh.r']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue17552'
versions = ['Python 3.5']

giampaolo · 2013-03-26T19:27:52Z

This is based on progress made in bpo-13564 and aims to provide a new sendfile() method for the socket class taking advantage of high-performance "zero-copy" os.sendfile() available on most POSIX platforms.
A wrapper on top of os.sendfile() appears to be convenient because getting everything right is a bit tricky. Also we can avoid code duplication in case we want to add sendfile() support to ftplib (bpo-13564) and httplib.

=== THE API ===

Attached is a draft patch proposing an API which:

uses os.sendfile() whenever available and usable (a mmap-like file is required)
if not falls back on using plain socket.sendall()
returns a tuple of two elements which indicates whether sendfile() was used and an exception instance in case it wasn't on account of an internal error, if any
does not support non-blocking sockets (will raise ValueError)

=== ALTERNATE API ===

In case the transfer is interrupted halfway because of an error the user has no clue on how many bytes were sent (and file.tell() won't tell =)). As such it probably makes sense to provide a custom socket.SendfileError exception encapsulating the original exception and the bytes sent as arguments.
For the same reason the returned tuple should probably include the number of bytes transmitted.

=== WINDOWS SUPPORT ===

Further development may include Windows support by using TransmitFile (http://msdn.microsoft.com/en-us/library/windows/desktop/ms740565(v=vs.85).aspx). Once we agree on an API I might start looking into Windows code (which appears to be quite tricky btw).

Any feedback is very welcome.

pitrou · 2013-03-26T19:43:25Z

A couple of comments:

I don't understand the point of the second member in the tuple
the timeout logic should be fixed so that the total operation time doesn't exceed the timeout, rather than each iteration (in other words, a deadline should be computed at the start and then the poll / select calls be issued in accordance with that deadline)
non-blocking sockets could be supported by returning partial writes, and letting the caller pass an offset argument
as a vocabulary nit, I don't really understand what "mmap-like file" means

neologix · 2013-03-26T19:53:29Z

I don't understand the "running out of FDs" thing. select() is limited to FDs less than FD_SETSIZE, but I don't see how you could get EMFILE (select() will return a ValueError)
is there any platform with sendfile() which doesn't support poll()?
I still don't see the point of calling sendfile() with a block size. Everybody just uses st_size, and that's how it's supposed to be done.

giampaolo · 2013-03-26T21:05:31Z

I don't understand the point of the second member in the tuple

The 'exception' member can be useful to know the reason why sendfile() failed and send() was used as fallback.

the timeout logic should be fixed so that the total operation
time doesn't exceed the timeout, rather than each iteration

Do you mean that if the user sets a timeout=2 and the whole transfer takes longer than that you expect a timeout exception?
That's pretty unusual (e.g. ftplib does not behave like that).

non-blocking sockets could be supported by returning partial writes,
and letting the caller pass an offset argument

I see little value in supporting non-blocking sockets because someone willing to do that usually wants to use sendfile() directly (no wrappers involved) as they are forced to handle errors and take count of transmitted bytes in their own code anyway.
Here's an example of how sendfile is supposed to be used with non-blocking sockets:
https://code.google.com/p/pyftpdlib/source/browse/tags/release-0.7.0/pyftpdlib/ftpserver.py#1035
An extra layer such as the one proposed in my patch is completely useless and also slow considering the amount of instructions before the actual sendfile() call.
It is however a good idea to provide an 'offset' argument.

as a vocabulary nit, I don't really understand what "mmap-like file" means

Will turn that into "regular files".

I don't see how you could get EMFILE (select() will return a ValueError)

Right. Will fix that.

is there any platform with sendfile() which doesn't support poll()?

No idea. I made a quick tour on snakebite and apparently all of the provided platforms have poll().

I still don't see the point of calling sendfile() with a block size

I will make some actual benchmarks and fix that later if makes sense.
Assuming providing a high blocksize (say 1MB) makes no difference in terms of CPU usage it's better to keep that for consistency with sendall() in case we fallback on using it.

giampaolo · 2013-04-08T15:14:11Z

New patch in attachment includes a new 'offset' parameter, new tests and also update file offset on return or in case of error so that file.tell() can be used to tell how many bytes were transmitted at any time.
This way we'll avoid using a custom exception.

In summary, the API looks like this.

Transfer ok:

>>> file = open('somefile', 'rb')
>>> s = socket.socket()
>>> sock.sendfile(file)
(True, None)
>>> file.tell()
20000000
>>>

...and in case sendfile() could not be used internally because file was not a regular file:

>>> file = io.BytesIO(b'x' * 1*1024*1024)
>>> sock.sendfile(file)
(False, UnsupportedOperation('fileno',))
>>> file.tell()
20000000
>>>

I still haven't looked into TransmitFile on Windows as I have to figure out how to compile Python 3.4 on Windows.

giampaolo · 2014-04-20T18:56:57Z

New patch in attachment. Changes:

docs
replaced select() / poll() with the new selectors module
file position is always updated both on return and on error; this means file.tell() is the designated way to know how many bytes were sent
replaced sendall() with send() so that we can count the number of bytes transmitted (related and rejected proposal: https://mail.python.org/pipermail/python-ideas/2014-April/027689.html)
send() now uses memoryview() for better performances to re-transmit data which was not sent by the first send() call
pre-emptively raise exception if file is not opened in binary mode
tests for ssl module

I've tried to work on Windows TransmitFile support but I got stuck as I'm not sure how to convert a file object into a HANDLE in C. I suppose Windows support can also be added later as a separate ticket and in the meantime I'd like to push this forward.

Open questions:

Is the current return value desirable (do we really care if os.sendfile() was used internally?)? Should the returned tuple also include the number transmitted bytes?
default blocksize: Charles-François was suggesting to remove the blocksize argument; FWIW I've made some quick benchmarks by using "time" cmdline utility with different blocksizes and I didn't notice substantial difference. I still think a blocksize parameter is necessary in case we fallback on using send() and also for consistency with ftplib's storbinary() method which will be involved later (bpo-13564).

giampaolo · 2014-04-21T20:33:32Z

Attached is a simple benchmark script transmitting a 100MB file.
On my Linux box sendfile() is almost twice as fast as send():

send()
real     0.0613s
user     0.0100s
sys      0.0900s
total    0.1000s

sendfile()
real     0.0318s
user     0.0000s
sys      0.0500s
total    0.0500s

MojoVampire · 2014-04-21T22:46:34Z

For TransmitFile support, the Windows function to turn an integer file descriptor into a WinAPI file HANDLE should be _get_osfhandle: http://msdn.microsoft.com/en-us/library/ks2530z6.aspx

4kir4 · 2014-04-21T23:34:51Z

Should socket.sendfile() always return number of bytes sent because file.tell() may be changed by something else that uses the same file descriptor?

What happens if the file grows?

Instead of returning (was_os_sendfile_used, os_sendfile_error), you could specify no_fallback=False that could be set to True to assert that the fallback is not used (with no_fallback=True the os_sendfile_error is raised instead of using socket.send() as a fallback).

If possible; always include number of bytes sent in any error that is raised.

giampaolo · 2014-04-22T02:13:17Z

Instead of returning [...] you could specify no_fallback=False that
could be set to True to assert that the fallback is not used
[...] and return the number of bytes sent.

Good idea, thanks, that is much better indeed. Updated patch is in attachment.

[...] file.tell() may be changed by something else that uses
the same file descriptor? What happens if the file grows?

I would say that is a use case we should explicitly not support as it probably implies you're doing something you're not supposed to.

If possible always include the number of bytes sent in any error that is raised.

That's similar to my recent (rejected) proposal for socket.sendall():
https://mail.python.org/pipermail/python-ideas/2014-April/027689.html
IMO the patch as it stands is fine as you can determine the number of bytes which were sent either by using the function return value or file.tell() (in case of error).
Also, updating the file offset on exit makes the sendfile() implementation behave exactly like send().

giampaolo · 2014-04-22T17:49:45Z

Yet another patch fixing some problems on Windows.
Hopefully this should be the last one as for what concerns the POSIX systems. As such I would kindly ask for a review and/or further suggestions.

neologix · 2014-04-23T07:44:45Z

I really don't like the use_fallback argument: as a user, I don't
care if it's using sendfile/splice/whatever WIndows uses.
I view this as a channel transfer (like Java's
http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#transferFrom(java.nio.channels.ReadableByteChannel,
long, long)), which moves bytes around from one FD to another.
If the user want precise control, he can just go ahead and call the
syscall itself.
Apart from complicating the prototype, what do this bring?
Just returning the number of bytes sent is fine
I really don't like the blocksize argument. Just use a really large
value internally to minimize the number of syscalls, that's all that
matters. I've *never* seen code which explicitly uses a blocksize: in
99% of cases, it just uses stat to find out the file size, and call
sendfile with it. Trying to be consistent with ftplib is IMO a bad
idea, since the API is just completely broken (I mean, just
sending/retrieving a file is complex). A useful parameter instead
would be to support sending only part of the file, so adding a count
argument.

You can have a look at
http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#transferFrom(java.nio.channels.ReadableByteChannel,
long, long) for an example many people bash Java, but they've designed
some great APIs :-)

giampaolo · 2014-04-23T17:26:38Z

I really don't like the use_fallback argument
Apart from complicating the prototype, what do this bring?

My initial thought was that the user might want to know *why* a file cannot be sent by using the fastest method and hence wants to see the original exception. Anyway, I have not strong opinions about this so I guess we can also drop it.

A useful parameter instead would be to support sending only part of the file,
so adding a count argument.

Have you read my patch? This is already provided by the "offset" parameter.

I really don't like the blocksize argument.
I've *never* seen code which explicitly uses a blocksize

Both sendfile() and TransmitFile provide a "blocksize" parameter for very good reasons therefore it seems natural that an API built on top of them exposes the same parameter as well.
Some examples in the stdlib which comes to mind using a blocksize are asynchat.async_chat.ac_out_buffer_size and ftplib.FTP.storbinary and I'm sure if you grep for "blocksize" you'll find others.
Providing a blocksize is also necessary to tell how many bytes to read from file in case send() is used, 'cause it's *crucial* that you don't read the whole file into memory.
I will also give a real world example: if your app wants to limit the transfer speed you will want to explicitly transmit smaller chunks of data and then "sleep", and the only way to do that is by manipulating the blocksize. So yes: a blocksize parameter *is* necessary, so please stop beating that horse.
As for using a bigger value: I made some benchmarks by using different sizes and I didn't notice any noticeable difference.

giampaolo · 2014-04-23T17:53:08Z

Note: my example about limiting the transfer speed does not really apply 'cause as this stands right now it cannot be used with non-blocking sockets. Other arguments do though and I hope it's clear that we need "blocksize".

4kir4 · 2014-04-23T18:21:35Z

I really don't like the use_fallback argument ..

I initially also thought so. But I've suggested the parameter to replace (was_os_sendfile_used, os_sendfile_error) returned value as a trade off between a slight complexity in the interface vs. allowing to detect performance bugs easily e.g., when someone passed a file-like object incompatible with os.sendfile by accident (it is not enough to have a valid fileno. What mmap-like means?).

.. as a user, I don't care if it's using sendfile/splice/whatever WIndows uses.
.. what do this bring?

The reason sendfile exists is performance. Otherwise socket.makefile and shutil.copyfileobj could be used instead.

use_fallback parameter provides a way to assert that an ineffective fallback is not used by accident. It may be ignored by most users.

An alternative is a new separate public method that doesn't use the fallback.

giampaolo · 2014-04-23T21:44:13Z

Considering the current indecision about certain design aspects I started a discussion on python-ideas: https://mail.python.org/pipermail/python-ideas/2014-April/027752.html

gvanrossum · 2014-04-23T22:47:28Z

Can you also think about how this would be wrapped in asyncio?

giampaolo · 2014-04-23T23:48:49Z

I think asyncio would be better off using os.sendfile() / TransmitFile directly, in fact the current patch explicitly does not support non-blocking sockets (I couldn't see any sane approach to do that).
Here's an example of how os.sendfile() should be used in an async manner:
https://code.google.com/p/pyftpdlib/source/browse/tags/release-0.7.0/pyftpdlib/ftpserver.py#1035
asyncio should be doing something very similar to that and do the necessary stuff so that you can "yield from transport.sendfile(file)".
Actually I can see what I can do in that regard after I'm finished with this.

neologix · 2014-04-24T07:02:02Z

> A useful parameter instead would be to support sending only part of the file,
> so adding a count argument.

Have you read my patch? This is already provided by the "offset" parameter.

Of course I read your patch ;-)
I mean I'd like a parameter for the offset, and another one for the
number of bytes to send, like in Java's version (and sendfile(), see
below).

> I really don't like the blocksize argument.
> I've *never* seen code which explicitly uses a blocksize

Both sendfile() and TransmitFile provide a "blocksize" parameter for very good reasons therefore it seems natural that an API built on top of them exposes the same parameter as well.

No, they expose a *count* parameter:
http://linux.die.net/man/2/sendfile
"""
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);
count is the number of bytes to copy between the file descriptors.
"""

You're mixing up blocksize, which is the maximum number of bytes to
transfer in one syscall and only makes sense in the context of
repeated syscalls, and count, which is the total amount of data you
want the function to transfer.
No sensible sendfile-like API exposes a maximum "blocksize" to send at
once, since the goal is to limit copies and system calls: you just
pass a source and destination FD, a starting offset, and a number of
bytes to transfer, and the syscall takes care of the rest.

Here, you basically implement sendall() on top of sendfile() (in
pseudo-code, using a buffer instead of a file and not taking into
account short writes but the idea if the same):

while <remaining data to send>:
    socket.sendfile(data[offset:offset+chunksize)

The way it's supposed to be used is simply:
socket.sendfile(data[offset:offset+count])

That's how everyone one uses sendfile(), that's how Java exposes it,
and that's IMO how it should be exposed.

To sum up, I think there's a fundamental confusion between blocksize
and count in this API: that's what I've been saying since the
beginning of this thread: if you disagree, that's OK, I just want to
make sure we're talking about the same thing ;-)

4kir4 · 2014-04-24T11:05:12Z

use_fallback parameter is mostly a debugging tool. If it helps to avoid the
indecision; I would side with neologix' remarks and also suggest to drop
the use_fallback parameter.

It seems the patch assumes *offset == nbytes_sent* that is false in general
e.g., if offset > 0 at the start of the function.

::

    _SEND_BLOCKSIZE = 262144 # ???

    def sendfile(self, file, offset=None, nbytes=None,
             *, nbytes_per_send=_SEND_BLOCKSIZE) -> nbytes_sent:
        """
        Send *nbytes* bytes from regular *file* starting at *offset* position.

    Return the number of bytes sent.

    If *offset* is ``None``; start at the current file position.
    If *nbytes* is ``None``; send file until EOF is reached.

    The socket should be connection-oriented e.g., SOCK_STREAM

    *nbytes_per_send* is used by a *send()*-based fallback code.
    *os.sendfile()* ignores it.

    - if socket is blocking (timeout is None) then it may keep
      trying to send data until an error occurs.

      Even on success it may return less than *nbytes* if there is
      not enough data available in *file*.

    - if socket is non-blocking and timeout == 0 then fail if
      even a single byte can't be sent immediately

    - if socket has timeout > 0 then raise the timeout error if
      more than *timeout* seconds pass since the call is started
      and nothing is sent i.e., use a single deadline for all
      system calls (like *socket.send()*).

    If timeout is not None then *socket.sendfile()* may send less
    bytes than specified.

    *file* position after the call is unspecified.

    """
    # pseudo-code
    total = 0
    if offset is None
        offset = file.tell()
    if nbytes is None:
        nbytes = os.path.getsize(file.name)
    interval = self.timeout
    if interval is not None:
         deadline = now() + interval
    while select([], [self], [], interval)[1]: # writable
        try:
            sent = os.sendfile(self, file, offset, nbytes)
        except BlockingIOError as e:
            assert getattr(e, 'characters_written', 0) == 0
            if interval is not None: # update interval
                interval = deadline - now()
                if interval < 0:
                    break
            continue # ignore
        else:
            if sent == 0:
                return total
        total += sent
        offset += sent
        nbytes -= sent
        if nbytes == 0:
            return total
        if interval is not None: # update interval
            interval = deadline - now()
            if interval < 0:
                break
    # timeout
    if total == 0:
        raise TimeoutError
    else:
        return total

giampaolo · 2014-04-24T15:56:50Z

[...] I'd like a parameter for the offset, and another one for the
number of bytes to send.
To sum up, I think there's a fundamental confusion between blocksize
and count in this API.

Ah OK, I see what you mean now. If seems we didn't understand each other. =)
And yes, I suppose you're right: if possible we should pass a high value and let sendfile() do its thing.
Note that we still have to implement an internal loop ourselves though because if the socket has a timeout sendfile() will return before EOF (I've checked this just now).

As for what to do, here's what I propose:

we provide a blocksize argument defaulting to None
in case of send() and blocksize == None we set it to 262144
in case of sendfile() we set it to a very high value (4M or something)
using os.path.getsize(file.name) looks risky to me as the user might have changed CWD in the meantime or something

I'm -1 about adding "count" *and* "blocksize" parameters. "blocksize" alone is good enough IMO and considering what I've just described it is a better name than "count".

4kir4 · 2014-04-25T00:59:10Z

count and blocksize are completely different.

*count* specifies how many bytes at most socket.sendfile should sent overall. It may change the result i.e., it may not be necessary that the file is read until EOF.

It has the same meaning as *nbytes* parameter for os.sendfile or *nbytes* in msg217121

*blocksize* doesn't change how many bytes is read in the end by socket.sendfile. At most it may affect time performance. It is *nbytes_per_send* in msg217121

pitrou · 2014-04-25T01:10:54Z

I'm -1 about adding "count" *and* "blocksize" parameters. "blocksize" > alone is good enough IMO and considering what I've just described it
is a better name than "count".

I'm confused. Why is "blocksize" necessary at all?

using os.path.getsize(file.name) looks risky to me

Why not fstat(fd) ?

4kir4 · 2014-04-25T08:40:56Z

I'm confused. Why is "blocksize" necessary at all?

My guess, it may be used to implement socket.send()-based fallback. Its meaning could be the same as *length* parameter in shutil.copyfileobj

The fallback is useful if os.sendfile doesn't exists or it doesn't accept given parameters e.g., if *file* is not mmap-like enough for os.sendfile.

> using os.path.getsize(file.name) looks risky to me

Why not fstat(fd) ?

os.path.getsize(file.name) in msg217121 is a pseudo-code (as said
in the comment) that expresses the intent that if *nbytes* parameter
is not specified (None) then socket.sendfile should send bytes from
the file until EOF is reached.

In real code, if *nbytes is None*; I would just call os.sendfile
repeatedly with a large constant *nbytes* parameter
until os.sendfile returns 0 (meaning EOF)
without asking the file size explicitly

It assumes socket.sendfile doesn't specify its behaviour if the file
size changes between the calls.

The pseudo-code in msg217121 is my opinion about the public interface for socket.sendfile -- It is different from the one in the current socket-sendfile5.patch

giampaolo · 2014-04-25T17:45:54Z

Given the opinions expressed so far I:

got rid of the "blocksize" parameter
got rid of the "use_fallback" parameter
added a "count" parameter
used os.fstat() to figure out the total file size and passed it directly to sendfile()

I'm attaching socket-sendfile6.patch which includes docs and many new tests.
Hopefully this should be the final one (yet to review though).

giampaolo · 2014-06-07T10:22:09Z

Looking back at this I think a "send_blocksize" argument is necessary after all. shutil.copyfileobj() has it, so is ftplib.FTP.storbinary() and httplib (bpo-13559) which will both be using socket.sendfile() once it gets included.
Updated patch is in attachment. If there are no other objections I'd be for committing this next week or something as I'm pretty confident with the current implementation.

neologix · 2014-06-07T11:10:37Z

Looking back at this I think a "send_blocksize" argument is necessary after all. shutil.copyfileobj() has it, so is ftplib.FTP.storbinary() and httplib (bpo-13559) which will both be using socket.sendfile() once it gets included.

Those APIs are really poor, please don't cripple sendfile() to mirror them.

Once again, a send_blocksize argument doesn't make sense, you won't
find it anywhere else. All that's needed is start offset and a
size/length argument.

giampaolo · 2014-06-07T11:22:53Z

I agree it is not necessary for sendfile() (you were right).
Do not introducing it for send(), though, poses some questions.
For instance, should we deprecate or ignore 'blocksize' argument in ftplib as well?
Generally speaking, when using send() there are circumstances where you might want to adjust the number of bytes to read from the file, for instance:

1: set a small blocksize (say 512 bytes) on systems where you have a limited amount of memory
2: set a big blocksize (say 256000) in order to speed up the transfer / use less CPU cycles; on very fast networks (e.g. LANs) this may result in a considerable speedup (I know 'cause I conducted these kind of tests in pyftpdlib: https://code.google.com/p/pyftpdlib/issues/detail?id=94).

giampaolo · 2014-06-07T11:25:32Z

...speaking of which, now that I look back at those benchmarks it looks like 65536 bytes is the best compromise (in my latest patch I used 16348).

giampaolo · 2014-06-10T16:53:00Z

Charles, Antoine, any final thought about this given the reasons I stated above? If you're still -1 about adding 'send_blocksize' argument I guess I can get rid of it and perhaps reintroduce it later if we see there's demand for it.

neologix · 2014-06-10T19:00:14Z

I agree it is not necessary for sendfile() (you were right).

Good we agree :-)

Do not introducing it for send(), though, poses some questions.
For instance, should we deprecate or ignore 'blocksize' argument in ftplib as well?

Honestly, we should deprecate the whole ftplib module :-)
More seriously, it's really low-level, I don't understand the point of
this whole callback-based API:
FTP.storbinary(command, file[, blocksize, callback, rest])Why not simply a:
FTP.store(source, target, binary=True)

If you have time and it interest you, trying to improve this module
would be great :-)

Generally speaking, when using send() there are circumstances where you might want to adjust the number of bytes to read from the file, for instance:

1: set a small blocksize (say 512 bytes) on systems where you have a limited amount of memory

2: set a big blocksize (say 256000) in order to speed up the transfer / use less CPU cycles; on very fast networks (e.g. LANs) this may result in a considerable speedup (I know 'cause I conducted these kind of tests in pyftpdlib: https://code.google.com/p/pyftpdlib/issues/detail?id=94).

I agree, but both points are addressed by sendfile(): internally, the
kernel will use whatever block size it likes, depending on the source
file type, page size, target NIC ring buffer size.

So to reply to your above question, I wouldn't feel too bad about
simply ignoring the blocksize argument in e.g. shutil.copyfile(). For
ftplib, it's a little harder since we're supposed to support an
optional callback, but calling a callback for every block will drive
performance down...

So I'd really like it if you could push the version without the
blocksize, and then we'll see (and hopefully fix the non-sensical
libraries that depend on it).

giampaolo · 2014-06-10T20:41:23Z

I agree, but both points are addressed by sendfile()

I'm talking about send(), not sendfile().
Please remember that send() will be used as the default on Windows or when non-regular files are passed to the function. My argument is about introducing an argument to use specifically with send(), not sendfile().
In summary:

sendfile(self, file, offset=0, count=None, send_blocksize=16384):
"""
[...]
If os.sendfile() is not available (e.g. Windows) or file is not
a regular file socket.send() will be used instead.
[...]
*send_blocksize* is the maximum number of bytes to transmit at
one time in case socket.send() is used.
"""

Honestly, we should deprecate the whole ftplib module :-)
More seriously, it's really low-level, I don't understand the point of
this whole callback-based API:
FTP.storbinary(command, file[, blocksize, callback, rest])
Why not simply a:
FTP.store(source, target, binary=True)

ftplib module API may be a bit rusty but there's a reason why it was designed like that.
'callback' and 'blocksize' arguments can be used to implement progress bars, in-place transformations of the source file's data and bandwidth throttling (by having your callback 'sleep' if more than N bytes were sent in the last second). 'rest' argument is necessary for resuming uploads.
I'm not saying ftplib API cannot be improved: maybe we can provide two higher level "put" and "get" methods but please let's discuss that into another thread.

neologix · 2014-06-10T22:12:09Z

> I agree, but both points are addressed by sendfile()

I'm talking about send(), not sendfile().
Please remember that send() will be used as the default on Windows or when non-regular files are passed to the function. My argument is about introducing an argument to use specifically with send(), not sendfile().

Which makes even less sense if it's not needed for sendfile() :-)
I really don't see why we're worrying so much about allocating 8K or
16K buffers, it's really ridiculous. For example, buffered file I/O
uses a default block size of 8K. We're not targeting embedded systems.

giampaolo · 2014-06-11T01:14:07Z

OK then, I'll trust your judgement. I'll use 8K as the default and will commit the patch soon.

python-dev · 2014-06-11T01:54:43Z

New changeset 001895c39fea by Giampaolo Rodola' in branch 'default':
fix issue bpo-17552: add socket.sendfile() method allowing to send a file over a socket by using high-performance os.sendfile() on UNIX. Patch by Giampaolo Rodola'·
http://hg.python.org/cpython/rev/001895c39fea

giampaolo added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Mar 26, 2013

giampaolo changed the title ~~socket.sendfile()~~ create_server Apr 8, 2013

giampaolo changed the title ~~create_server~~ socket.sendfile() Apr 8, 2013

vstinner changed the title ~~socket.sendfile()~~ Add a new socket.sendfile() method Apr 27, 2014

giampaolo self-assigned this Jun 11, 2014

giampaolo closed this as completed Jun 11, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

Add a new socket.sendfile() method #61752

Add a new socket.sendfile() method #61752

Comments

giampaolo commented Mar 26, 2013

giampaolo commented Mar 26, 2013

pitrou commented Mar 26, 2013

neologix mannequin commented Mar 26, 2013

giampaolo commented Mar 26, 2013

giampaolo commented Apr 8, 2013

giampaolo commented Apr 20, 2014

giampaolo commented Apr 21, 2014

MojoVampire mannequin commented Apr 21, 2014

4kir4 mannequin commented Apr 21, 2014

giampaolo commented Apr 22, 2014

giampaolo commented Apr 22, 2014

neologix mannequin commented Apr 23, 2014

giampaolo commented Apr 23, 2014

giampaolo commented Apr 23, 2014

4kir4 mannequin commented Apr 23, 2014

giampaolo commented Apr 23, 2014

gvanrossum commented Apr 23, 2014

giampaolo commented Apr 23, 2014

neologix mannequin commented Apr 24, 2014

4kir4 mannequin commented Apr 24, 2014

giampaolo commented Apr 24, 2014

4kir4 mannequin commented Apr 25, 2014

pitrou commented Apr 25, 2014

4kir4 mannequin commented Apr 25, 2014

giampaolo commented Apr 25, 2014

giampaolo commented Jun 7, 2014

neologix mannequin commented Jun 7, 2014

giampaolo commented Jun 7, 2014

giampaolo commented Jun 7, 2014

giampaolo commented Jun 10, 2014

neologix mannequin commented Jun 10, 2014

giampaolo commented Jun 10, 2014

neologix mannequin commented Jun 10, 2014

giampaolo commented Jun 11, 2014

python-dev mannequin commented Jun 11, 2014