classification
Title: Add a new socket.sendfile() method
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: giampaolo.rodola Nosy List: akira, asvetlov, christian.heimes, giampaolo.rodola, gvanrossum, josh.rosenberg, josiah.carlson, neologix, pitrou, python-dev, rosslagerwall, yselivanov
Priority: normal Keywords: 3.2regression

Created on 2013-03-26 19:27 by giampaolo.rodola, last changed 2014-06-11 02:04 by giampaolo.rodola. This issue is now closed.

Files
File name Uploaded Description Edit
socket-sendfile1.patch giampaolo.rodola, 2013-03-26 19:27 initial draft review
socket-sendfile2.patch giampaolo.rodola, 2013-04-08 15:21 update file offset add 'offset' parameter review
socket-sendfile3.patch giampaolo.rodola, 2014-04-20 18:56 docs, selectors module, always seek() file, use memoryview(), ssl tests review
bench.py giampaolo.rodola, 2014-04-21 20:33
socket-sendfile4.patch giampaolo.rodola, 2014-04-22 02:13 return no. of bytes sent instead of (ok, exc); "sleep" on BlockingIOError review
socket-sendfile5.patch giampaolo.rodola, 2014-04-22 17:49 fix some issues on windows review
socket-sendfile6.patch giampaolo.rodola, 2014-04-25 17:45 get rid of "blocksize" and "use_fallback" args, add "count" arg, use fstat() review
socket-sendfile7.patch giampaolo.rodola, 2014-06-07 10:22 adds send_blocksize argument review
Messages (35)
msg185292 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-03-26 19:27
This is based on progress made in issue 13564 and aims to provide a new sendfile() method for the socket class taking advantage of high-performance "zero-copy" os.sendfile() available on most POSIX platforms.
A wrapper on top of os.sendfile() appears to be convenient because getting everything right is a bit tricky. Also we can avoid code duplication in case we want to add sendfile() support to ftplib (issue 13564) and httplib.

=== THE API ===

Attached is a draft patch proposing an API which:

- uses os.sendfile() whenever available and usable (a mmap-like file is required)
- if not falls back on using plain socket.sendall()
- returns a tuple of two elements which indicates whether sendfile() was used and an exception instance in case it wasn't on account of an internal error, if any 
- does not support non-blocking sockets (will raise ValueError)


=== ALTERNATE API === 

1) In case the transfer is interrupted halfway because of an error the user has no clue on how many bytes were sent (and file.tell() won't tell =)). As such it probably makes sense to provide a custom socket.SendfileError exception encapsulating the original exception and the bytes sent as arguments.

2) For the same reason the returned tuple should probably include the number of bytes transmitted.

=== WINDOWS SUPPORT ===

Further development may include Windows support by using TransmitFile (http://msdn.microsoft.com/en-us/library/windows/desktop/ms740565(v=vs.85).aspx). Once we agree on an API I might start looking into Windows code (which appears to be quite tricky btw).

Any feedback is very welcome.
msg185294 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-26 19:43
A couple of comments:

- I don't understand the point of the second member in the tuple
- the timeout logic should be fixed so that the total operation time doesn't exceed the timeout, rather than each iteration (in other words, a deadline should be computed at the start and then the poll / select calls be issued in accordance with that deadline)
- non-blocking sockets could be supported by returning partial writes, and letting the caller pass an offset argument
- as a vocabulary nit, I don't really understand what "mmap-like file" means
msg185295 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-03-26 19:53
- I don't understand the "running out of FDs" thing. select() is limited to FDs less than FD_SETSIZE, but I don't see how you could get EMFILE (select() will return a ValueError)

- is there any platform with sendfile() which doesn't support poll()?

- I still don't see the point of calling sendfile() with a block size. Everybody just uses st_size, and that's how it's supposed to be done.
msg185303 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-03-26 21:05
> I don't understand the point of the second member in the tuple

The 'exception' member can be useful to know the reason why sendfile() failed and send() was used as fallback.


> the timeout logic should be fixed so that the total operation
> time doesn't exceed the timeout, rather than each iteration

Do you mean that if the user sets a timeout=2 and the whole transfer takes longer than that you expect a timeout exception?
That's pretty unusual (e.g. ftplib does not behave like that).


> non-blocking sockets could be supported by returning partial writes,
> and letting the caller pass an offset argument

I see little value in supporting non-blocking sockets because someone willing to do that usually wants to use sendfile() directly (no wrappers involved) as they are forced to handle errors and take count of transmitted bytes in their own code anyway.
Here's an example of how sendfile is supposed to be used with non-blocking sockets:
#1035">https://code.google.com/p/pyftpdlib/source/browse/tags/release-0.7.0/pyftpdlib/ftpserver.py#1035
An extra layer such as the one proposed in my patch is completely useless and also slow considering the amount of instructions before the actual sendfile() call.
It is however a good idea to provide an 'offset' argument.


> as a vocabulary nit, I don't really understand what "mmap-like file" means

Will turn that into "regular files".


> I don't see how you could get EMFILE (select() will return a ValueError)

Right. Will fix that.

> is there any platform with sendfile() which doesn't support poll()?

No idea. I made a quick tour on snakebite and apparently all of the provided platforms have poll(). 


> I still don't see the point of calling sendfile() with a block size

I will make some actual benchmarks and fix that later if makes sense.
Assuming providing a high blocksize (say 1MB) makes no difference in terms of CPU usage it's better to keep that for consistency with sendall() in case we fallback on using it.
msg186305 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-04-08 15:14
New patch in attachment includes a new 'offset' parameter, new tests and also update file offset on return or in case of error so that file.tell() can be used to tell how many bytes were transmitted at any time.
This way we'll avoid using a custom exception.

In summary, the API looks like this.

Transfer ok:

>>> file = open('somefile', 'rb')
>>> s = socket.socket()
>>> sock.sendfile(file)
(True, None)
>>> file.tell()
20000000
>>>


...and in case sendfile() could not be used internally because file was not a regular file:

>>> file = io.BytesIO(b'x' * 1*1024*1024)
>>> sock.sendfile(file)
(False, UnsupportedOperation('fileno',))
>>> file.tell()
20000000
>>>


I still haven't looked into TransmitFile on Windows as I have to figure out how to compile Python 3.4 on Windows.
msg216915 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-04-20 18:56
New patch in attachment. Changes:

- docs

- replaced select() / poll() with the new selectors module

- file position is always updated both on return and on error; this means file.tell() is the designated way to know how many bytes were sent

- replaced sendall() with send() so that we can count the number of bytes transmitted (related and rejected proposal: https://mail.python.org/pipermail/python-ideas/2014-April/027689.html)

- send() now uses memoryview() for better performances to re-transmit data which was not sent by the first send() call

- pre-emptively raise exception if file is not opened in binary mode

- tests for ssl module

I've tried to work on Windows TransmitFile support but I got stuck as I'm not sure how to convert a file object into a HANDLE in C. I suppose Windows support can also be added later as a separate ticket and in the meantime I'd like to push this forward.

Open questions: 

- Is the current return value desirable (do we really care if os.sendfile() was used internally?)? Should the returned tuple also include the number transmitted bytes?

- default blocksize: Charles-François was suggesting to remove the blocksize argument; FWIW I've made some quick benchmarks by using "time" cmdline utility with different blocksizes and I didn't notice substantial difference. I still think a blocksize parameter is necessary in case we fallback on using send() and also for consistency with ftplib's storbinary() method which will be involved later (issue 13564).
msg216968 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-04-21 20:33
Attached is a simple benchmark script transmitting a 100MB file. 
On my Linux box sendfile() is almost twice as fast as send():

send()
real     0.0613s
user     0.0100s
sys      0.0900s
total    0.1000s

sendfile()
real     0.0318s
user     0.0000s
sys      0.0500s
total    0.0500s
msg216973 - (view) Author: Josh Rosenberg (josh.rosenberg) * Date: 2014-04-21 22:46
For TransmitFile support, the Windows function to turn an integer file descriptor into a WinAPI file HANDLE should be _get_osfhandle: http://msdn.microsoft.com/en-us/library/ks2530z6.aspx
msg216975 - (view) Author: Akira Li (akira) * Date: 2014-04-21 23:34
Should socket.sendfile() always return number of bytes sent because file.tell() may be changed by something else that uses the same file descriptor?

What happens if the file grows?

Instead of returning `(was_os_sendfile_used, os_sendfile_error)`, you could specify `no_fallback=False` that could be set to `True` to assert that the fallback is not used (with `no_fallback=True` the `os_sendfile_error` is raised instead of using socket.send() as a fallback). 

If possible; always include number of bytes sent in any error that is raised.
msg216979 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-04-22 02:13
> Instead of returning [...] you could specify `no_fallback=False` that 
> could be set to `True` to assert that the fallback is not used
> [...] and return the number of bytes sent.

Good idea, thanks, that is much better indeed. Updated patch is in attachment.


> [...] file.tell() may be changed by something else that uses 
> the same file descriptor? What happens if the file grows?

I would say that is a use case we should explicitly not support as it probably implies you're doing something you're not supposed to. 


> If possible always include the number of bytes sent in any error that is raised.

That's similar to my recent (rejected) proposal for socket.sendall():
https://mail.python.org/pipermail/python-ideas/2014-April/027689.html
IMO the patch as it stands is fine as you can determine the number of bytes which were sent either by using the function return value or file.tell() (in case of error).
Also, updating the file offset on exit makes the sendfile() implementation behave exactly like send().
msg217015 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-04-22 17:49
Yet another patch fixing some problems on Windows.
Hopefully this should be the last one as for what concerns the POSIX systems. As such I would kindly ask for a review and/or further suggestions.
msg217058 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-23 07:44
1) I really don't like the use_fallback argument: as a user, I don't
care if it's using sendfile/splice/whatever WIndows uses.
I view this as a channel transfer (like Java's
http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#transferFrom(java.nio.channels.ReadableByteChannel,
long, long)), which moves bytes around from one FD to another.
If the user want precise control, he can just go ahead and call the
syscall itself.
Apart from complicating the prototype, what do this bring?

2) Just returning the number of bytes sent is fine

3) I really don't like the blocksize argument. Just use a really large
value internally to minimize the number of syscalls, that's all that
matters. I've *never* seen code which explicitly uses a blocksize: in
99% of cases, it just uses stat to find out the file size, and call
sendfile with it. Trying to be consistent with ftplib is IMO a bad
idea, since the API is just completely broken (I mean, just
sending/retrieving a file is complex). A useful parameter instead
would be to support sending only part of the file, so adding a count
argument.

You can have a look at
http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#transferFrom(java.nio.channels.ReadableByteChannel,
long, long) for an example many people bash Java, but they've designed
some great APIs :-)
msg217080 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-04-23 17:26
> 1) I really don't like the use_fallback argument
> Apart from complicating the prototype, what do this bring?

My initial thought was that the user might want to know *why* a file cannot be sent by using the fastest method and hence wants to see the original exception. Anyway, I have not strong opinions about this so I guess we can also drop it.


> A useful parameter instead would be to support sending only part of the file, 
> so adding a count argument.

Have you read my patch? This is already provided by the "offset" parameter.


> I really don't like the blocksize argument.
> I've *never* seen code which explicitly uses a blocksize

Both sendfile() and TransmitFile provide a "blocksize" parameter for very good reasons therefore it seems natural that an API built on top of them exposes the same parameter as well.
Some examples in the stdlib which comes to mind using a blocksize are asynchat.async_chat.ac_out_buffer_size and ftplib.FTP.storbinary and I'm sure if you grep for "blocksize" you'll find others.
Providing a blocksize is also necessary to tell how many bytes to read from file in case send() is used, 'cause it's *crucial* that you don't read the whole file into memory.
I will also give a real world example: if your app wants to limit the transfer speed you will want to explicitly transmit smaller chunks of data and then "sleep", and the only way to do that is by manipulating the blocksize. So yes: a blocksize parameter *is* necessary, so please stop beating that horse. 
As for using a bigger value: I made some benchmarks by using different sizes and I didn't notice any noticeable difference.
msg217081 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-04-23 17:53
Note: my example about limiting the transfer speed does not really apply 'cause as this stands right now it cannot be used with non-blocking sockets. Other arguments do though and I hope it's clear that we need "blocksize".
msg217082 - (view) Author: Akira Li (akira) * Date: 2014-04-23 18:21
> I really don't like the use_fallback argument ..

I initially also thought so. But I've suggested the parameter to replace `(was_os_sendfile_used, os_sendfile_error)` returned value as a *trade off* between a slight complexity in the interface vs. allowing to detect performance bugs easily e.g., when someone passed a file-like object incompatible with os.sendfile by accident (it is not enough to have a valid fileno. What mmap-like means?).

> .. as a user, I don't care if it's using sendfile/splice/whatever WIndows uses.
> .. what do this bring?

The reason sendfile exists is performance. Otherwise socket.makefile and shutil.copyfileobj could be used instead.

use_fallback parameter provides a way to assert that an ineffective fallback is not used by accident. It may be ignored by most users.

An alternative is a new separate public method that doesn't use the fallback.
msg217090 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-04-23 21:44
Considering the current indecision about certain design aspects I started a discussion on python-ideas: https://mail.python.org/pipermail/python-ideas/2014-April/027752.html
msg217099 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-04-23 22:47
Can you also think about how this would be wrapped in asyncio?
msg217104 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-04-23 23:48
I think asyncio would be better off using os.sendfile() / TransmitFile directly, in fact the current patch explicitly does not support non-blocking sockets (I couldn't see any sane approach to do that).
Here's an example of how os.sendfile() should be used in an async manner:
#1035">https://code.google.com/p/pyftpdlib/source/browse/tags/release-0.7.0/pyftpdlib/ftpserver.py#1035
asyncio should be doing something very similar to that and do the necessary stuff so that you can "yield from transport.sendfile(file)".
Actually I can see what I can do in that regard after I'm finished with this.
msg217120 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-24 07:02
>> A useful parameter instead would be to support sending only part of the file,
>> so adding a count argument.
>
> Have you read my patch? This is already provided by the "offset" parameter.

Of course I read your patch ;-)
I mean I'd like a parameter for the offset, and another one for the
number of bytes to send, like in Java's version (and sendfile(), see
below).

>> I really don't like the blocksize argument.
>> I've *never* seen code which explicitly uses a blocksize
>
> Both sendfile() and TransmitFile provide a "blocksize" parameter for very good reasons therefore it seems natural that an API built on top of them exposes the same parameter as well.

No, they expose a *count* parameter:
http://linux.die.net/man/2/sendfile
"""
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);
count is the number of bytes to copy between the file descriptors.
"""

You're mixing up blocksize, which is the maximum number of bytes to
transfer in one syscall and only makes sense in the context of
repeated syscalls, and count, which is the total amount of data you
want the function to transfer.
No sensible sendfile-like API exposes a maximum "blocksize" to send at
once, since the goal is to limit copies and system calls: you just
pass a source and destination FD, a starting offset, and a number of
bytes to transfer, and the syscall takes care of the rest.

Here, you basically implement sendall() on top of sendfile() (in
pseudo-code, using a buffer instead of a file and not taking into
account short writes but the idea if the same):

while <remaining data to send>:
    socket.sendfile(data[offset:offset+chunksize)

The way it's supposed to be used is simply:
socket.sendfile(data[offset:offset+count])

That's how everyone one uses sendfile(), that's how Java exposes it,
and that's IMO how it should be exposed.

To sum up, I think there's a fundamental confusion between blocksize
and count in this API: that's what I've been saying since the
beginning of this thread: if you disagree, that's OK, I just want to
make sure we're talking about the same thing ;-)
msg217121 - (view) Author: Akira Li (akira) * Date: 2014-04-24 11:05
use_fallback parameter is mostly a debugging tool.  If it helps to avoid the
indecision; I would side with neologix' remarks and also suggest to drop
the use_fallback parameter.

It seems the patch assumes *offset == nbytes_sent* that is false in general
e.g., if offset > 0 at the start of the function.

::

    _SEND_BLOCKSIZE = 262144 # ???

    def sendfile(self, file, offset=None, nbytes=None,
             *, nbytes_per_send=_SEND_BLOCKSIZE) -> nbytes_sent:
        """
        Send *nbytes* bytes from regular *file* starting at *offset* position.

        Return the number of bytes sent.

        If *offset* is ``None``; start at the current file position.
        If *nbytes* is ``None``; send file until EOF is reached.

        The socket should be connection-oriented e.g., SOCK_STREAM

        *nbytes_per_send* is used by a *send()*-based fallback code.
        *os.sendfile()* ignores it.

        - if socket is blocking (timeout is None) then it may keep
          trying to send data until an error occurs.

          Even on success it may return less than *nbytes* if there is
          not enough data available in *file*.

        - if socket is non-blocking and timeout == 0 then fail if
          even a single byte can't be sent immediately

        - if socket has timeout > 0 then raise the timeout error if
          more than *timeout* seconds pass since the call is started
          and nothing is sent i.e., use a single deadline for all
          system calls (like *socket.send()*).

        If timeout is not None then *socket.sendfile()* may send less
        bytes than specified.

        *file* position after the call is unspecified.

        """
        # pseudo-code
        total = 0
        if offset is None
            offset = file.tell()
        if nbytes is None:
            nbytes = os.path.getsize(file.name)
        interval = self.timeout
        if interval is not None:
             deadline = now() + interval
        while select([], [self], [], interval)[1]: # writable
            try:
                sent = os.sendfile(self, file, offset, nbytes)
            except BlockingIOError as e:
                assert getattr(e, 'characters_written', 0) == 0
                if interval is not None: # update interval
                    interval = deadline - now()
                    if interval < 0:
                        break
                continue # ignore
            else:
                if sent == 0:
                    return total
            total += sent
            offset += sent
            nbytes -= sent
            if nbytes == 0:
                return total
            if interval is not None: # update interval
                interval = deadline - now()
                if interval < 0:
                    break
        # timeout
        if total == 0:
            raise TimeoutError
        else:
            return total
msg217128 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-04-24 15:56
> [...] I'd like a parameter for the offset, and another one for the
> number of bytes to send.
> To sum up, I think there's a fundamental confusion between blocksize
> and count in this API.

Ah OK, I see what you mean now. If seems we didn't understand each other. =)
And yes, I suppose you're right: if possible we should pass a high value and let sendfile() do its thing.
Note that we still have to implement an internal loop ourselves though because if the socket has a timeout sendfile() will return before EOF (I've checked this just now).

As for what to do, here's what I propose:
- we provide a blocksize argument defaulting to None
- in case of send() and blocksize == None we set it to 262144
- in case of sendfile() we set it to a very high value (4M or something)
- using os.path.getsize(file.name) looks risky to me as the user might have changed CWD in the meantime or something

I'm -1 about adding "count" *and* "blocksize" parameters. "blocksize" alone is good enough IMO and considering what I've just described it is a better name than "count".
msg217143 - (view) Author: Akira Li (akira) * Date: 2014-04-25 00:59
count and blocksize are completely different.

*count* specifies how many bytes at most socket.sendfile should sent overall. It may change the result i.e., it may not be necessary that the file is read until EOF.

It has the same meaning as *nbytes* parameter for os.sendfile or *nbytes* in msg217121

*blocksize* doesn't change how many bytes is read in the end by socket.sendfile. At most it may affect time performance. It is *nbytes_per_send* in msg217121
msg217144 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-25 01:10
> I'm -1 about adding "count" *and* "blocksize" parameters. "blocksize" > alone is good enough IMO and considering what I've just described it
> is a better name than "count".

I'm confused. Why is "blocksize" necessary at all?

> using os.path.getsize(file.name) looks risky to me

Why not fstat(fd) ?
msg217156 - (view) Author: Akira Li (akira) * Date: 2014-04-25 08:40
> I'm confused. Why is "blocksize" necessary at all?

My guess, it may be used to implement socket.send()-based fallback. Its meaning could be the same as *length* parameter in shutil.copyfileobj

The fallback is useful if os.sendfile doesn't exists or it doesn't accept given parameters e.g., if *file* is not mmap-like enough for os.sendfile.

> > using os.path.getsize(file.name) looks risky to me

> Why not fstat(fd) ?

os.path.getsize(file.name) in msg217121 is a pseudo-code (as said
in the comment) that expresses the intent that if *nbytes* parameter
is not specified (None) then socket.sendfile should send bytes from
the file until EOF is reached. 

In real code, if *nbytes is None*;  I would just call os.sendfile
repeatedly with a large constant *nbytes* parameter
until os.sendfile returns 0 (meaning EOF) 
without asking the file size explicitly

It assumes socket.sendfile doesn't specify its behaviour if the file
size changes between the calls.

The pseudo-code in msg217121 is my opinion about the public interface for socket.sendfile -- It is different from the one in the current socket-sendfile5.patch
msg217165 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-04-25 17:45
Given the opinions expressed so far I:
- got rid of the "blocksize" parameter
- got rid of the "use_fallback" parameter
- added a "count" parameter
- used os.fstat() to figure out the total file size and passed it directly to sendfile()

I'm attaching socket-sendfile6.patch which includes docs and many new tests.
Hopefully this should be the final one (yet to review though).
msg219926 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-06-07 10:22
Looking back at this I think a "send_blocksize" argument is necessary after all. shutil.copyfileobj() has it, so is ftplib.FTP.storbinary() and  httplib (issue 13559) which will both be using socket.sendfile() once it gets included.
Updated patch is in attachment. If there are no other objections I'd be for committing this next week or something as I'm pretty confident with the current implementation.
msg219927 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-06-07 11:10
> Looking back at this I think a "send_blocksize" argument is necessary after all. shutil.copyfileobj() has it, so is ftplib.FTP.storbinary() and  httplib (issue 13559) which will both be using socket.sendfile() once it gets included.

Those APIs are really poor, please don't cripple sendfile() to mirror them.

Once again, a send_blocksize argument doesn't make sense, you won't
find it anywhere else. All that's needed is start offset and a
size/length argument.
msg219928 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-06-07 11:22
I agree it is not necessary for sendfile() (you were right).
Do not introducing it for send(), though, poses some questions. 
For instance, should we deprecate or ignore 'blocksize' argument in ftplib as well?
Generally speaking, when using send() there are circumstances where you might want to adjust the number of bytes to read from the file, for instance:

- 1: set a small blocksize (say 512 bytes) on systems where you have a limited amount of memory

- 2: set a big blocksize (say 256000) in order to speed up the transfer / use less CPU cycles; on very fast networks (e.g. LANs) this may result in a considerable speedup (I know 'cause I conducted these kind of tests in pyftpdlib: https://code.google.com/p/pyftpdlib/issues/detail?id=94).
msg219929 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-06-07 11:25
...speaking of which, now that I look back at those benchmarks it looks like 65536 bytes is the best compromise (in my latest patch I used 16348).
msg220173 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-06-10 16:53
Charles, Antoine, any final thought about this given the reasons I stated above? If you're still -1 about adding 'send_blocksize' argument I guess I can get rid of it and perhaps reintroduce it later if we see there's demand for it.
msg220191 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-06-10 19:00
> I agree it is not necessary for sendfile() (you were right).

Good we agree :-)

> Do not introducing it for send(), though, poses some questions.
> For instance, should we deprecate or ignore 'blocksize' argument in ftplib as well?

Honestly, we should deprecate the whole ftplib module :-)
More seriously, it's really low-level, I don't understand the point of
this whole callback-based API:
FTP.storbinary(command, file[, blocksize, callback, rest])Why not simply a:
FTP.store(source, target, binary=True)

If you have time and it interest you, trying to improve this module
would be great :-)

> Generally speaking, when using send() there are circumstances where you might want to adjust the number of bytes to read from the file, for instance:
>
> - 1: set a small blocksize (say 512 bytes) on systems where you have a limited amount of memory
>
> - 2: set a big blocksize (say 256000) in order to speed up the transfer / use less CPU cycles; on very fast networks (e.g. LANs) this may result in a considerable speedup (I know 'cause I conducted these kind of tests in pyftpdlib: https://code.google.com/p/pyftpdlib/issues/detail?id=94).

I agree, but both points are addressed by sendfile(): internally, the
kernel will use whatever block size it likes, depending on the source
file type, page size, target NIC ring buffer size.

So to reply to your above question, I wouldn't feel too bad about
simply ignoring the blocksize argument in e.g. shutil.copyfile(). For
ftplib, it's a little harder since we're supposed to support an
optional callback, but calling a callback for every block will drive
performance down...

So I'd really like it if you could push the version without the
blocksize, and then we'll see (and hopefully fix the non-sensical
libraries that depend on it).
msg220195 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-06-10 20:41
> I agree, but both points are addressed by sendfile()

I'm talking about send(), not sendfile(). 
Please remember that send() will be used as the default on Windows or when non-regular files are passed to the function. My argument is about introducing an argument to use specifically with send(), not sendfile().
In summary:

sendfile(self, file, offset=0, count=None, send_blocksize=16384):
    """ 
    [...]
    If os.sendfile() is not available (e.g. Windows) or file is not 
    a regular file socket.send() will be used instead.
    [...]
    *send_blocksize* is the maximum number of bytes to transmit at 
    one time in case socket.send() is used.
    """

> Honestly, we should deprecate the whole ftplib module :-)
> More seriously, it's really low-level, I don't understand the point of
> this whole callback-based API:
> FTP.storbinary(command, file[, blocksize, callback, rest])
> Why not simply a:
> FTP.store(source, target, binary=True)

ftplib module API may be a bit rusty but there's a reason why it was designed like that.
'callback' and 'blocksize' arguments can be used to implement progress bars, in-place transformations of the source file's data and bandwidth throttling (by having your callback 'sleep' if more than N bytes were sent in the last second). 'rest' argument is necessary for resuming uploads.
I'm not saying ftplib API cannot be improved: maybe we can provide two higher level "put" and "get" methods but please let's discuss that into another thread.
msg220202 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-06-10 22:12
>> I agree, but both points are addressed by sendfile()
>
> I'm talking about send(), not sendfile().
> Please remember that send() will be used as the default on Windows or when non-regular files are passed to the function. My argument is about introducing an argument to use specifically with send(), not sendfile().

Which makes even less sense if it's not needed for sendfile() :-)
I really don't see why we're worrying so much about allocating 8K or
16K buffers, it's really ridiculous. For example, buffered file I/O
uses a default block size of 8K. We're not targeting embedded systems.
msg220220 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-06-11 01:14
OK then, I'll trust your judgement. I'll use 8K as the default and will commit the patch soon.
msg220226 - (view) Author: Roundup Robot (python-dev) Date: 2014-06-11 01:54
New changeset 001895c39fea by Giampaolo Rodola' in branch 'default':
fix issue #17552: add socket.sendfile() method allowing to send a file over a socket by using high-performance os.sendfile() on UNIX. Patch by Giampaolo Rodola'·
http://hg.python.org/cpython/rev/001895c39fea
History
Date User Action Args
2014-06-11 02:04:53giampaolo.rodolasetkeywords: + 3.2regression, - patch, needs review
status: open -> closed
resolution: fixed
2014-06-11 01:54:43python-devsetnosy: + python-dev
messages: + msg220226
2014-06-11 01:14:07giampaolo.rodolasetassignee: giampaolo.rodola
messages: + msg220220
2014-06-10 22:12:08neologixsetmessages: + msg220202
2014-06-10 20:41:23giampaolo.rodolasetmessages: + msg220195
2014-06-10 19:00:14neologixsetmessages: + msg220191
2014-06-10 16:53:00giampaolo.rodolasetmessages: + msg220173
2014-06-07 11:25:32giampaolo.rodolasetmessages: + msg219929
2014-06-07 11:22:54giampaolo.rodolasetmessages: + msg219928
2014-06-07 11:10:36neologixsetmessages: + msg219927
2014-06-07 10:22:11giampaolo.rodolasetfiles: + socket-sendfile7.patch

messages: + msg219926
2014-04-28 10:08:25giampaolo.rodolalinkissue13559 dependencies
2014-04-27 00:31:49hayposettitle: socket.sendfile() -> Add a new socket.sendfile() method
2014-04-25 17:45:56giampaolo.rodolasetfiles: + socket-sendfile6.patch

messages: + msg217165
2014-04-25 08:40:55akirasetmessages: + msg217156
2014-04-25 01:10:54pitrousetmessages: + msg217144
2014-04-25 00:59:10akirasetmessages: + msg217143
2014-04-24 15:56:50giampaolo.rodolasetmessages: + msg217128
2014-04-24 11:05:12akirasetmessages: + msg217121
2014-04-24 07:02:02neologixsetmessages: + msg217120
2014-04-23 23:48:49giampaolo.rodolasetmessages: + msg217104
2014-04-23 22:47:28gvanrossumsetnosy: + gvanrossum
messages: + msg217099
2014-04-23 21:46:29yselivanovsetnosy: + yselivanov
2014-04-23 21:44:13giampaolo.rodolasetmessages: + msg217090
2014-04-23 18:21:35akirasetmessages: + msg217082
2014-04-23 17:53:08giampaolo.rodolasetmessages: + msg217081
2014-04-23 17:26:38giampaolo.rodolasetmessages: + msg217080
2014-04-23 07:44:46neologixsetmessages: + msg217058
2014-04-22 17:49:45giampaolo.rodolasetfiles: + socket-sendfile5.patch

messages: + msg217015
2014-04-22 02:19:46giampaolo.rodolasetnosy: + josiah.carlson
2014-04-22 02:13:21giampaolo.rodolasetfiles: + socket-sendfile4.patch

messages: + msg216979
2014-04-21 23:34:51akirasetnosy: + akira
messages: + msg216975
2014-04-21 22:46:33josh.rosenbergsetnosy: + josh.rosenberg
messages: + msg216973
2014-04-21 20:33:32giampaolo.rodolasetfiles: + bench.py

messages: + msg216968
2014-04-20 18:57:38giampaolo.rodolasetversions: + Python 3.5, - Python 3.4
2014-04-20 18:57:00giampaolo.rodolasetfiles: + socket-sendfile3.patch

messages: + msg216915
2013-04-08 15:21:38giampaolo.rodolasetfiles: + socket-sendfile2.patch
2013-04-08 15:21:23giampaolo.rodolasetfiles: - socket-sendfile2.patch
2013-04-08 15:15:13giampaolo.rodolasetfiles: + socket-sendfile2.patch
2013-04-08 15:14:11giampaolo.rodolasetmessages: + msg186305
2013-04-08 14:57:11giampaolo.rodolasettitle: create_server -> socket.sendfile()
2013-04-08 14:56:45giampaolo.rodolasettitle: socket.sendfile() -> create_server
2013-04-07 13:31:33asvetlovsetnosy: + asvetlov
2013-03-27 07:42:04rosslagerwallsetnosy: + rosslagerwall
2013-03-26 21:05:31giampaolo.rodolasetmessages: + msg185303
2013-03-26 20:13:47christian.heimessetnosy: + christian.heimes
2013-03-26 19:53:29neologixsetmessages: + msg185295
2013-03-26 19:43:24pitrousetnosy: + pitrou, neologix
messages: + msg185294
2013-03-26 19:39:46giampaolo.rodolalinkissue13564 dependencies
2013-03-26 19:27:53giampaolo.rodolacreate