New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ftplib and sendfile() #57773
Comments
In attachment.
|
> in case of disconnection OSError is raised instead of socket.error
PEP 3151!
>>> socket.error, OSError, IOError
(<class 'OSError'>, <class 'OSError'>, <class 'OSError'>) :) |
os.fstat wouldn't work since it succeeds with non-"regular" files, e.g. standard I/O: >>> os.fstat(0)
posix.stat_result(st_mode=8592, st_ino=5, st_dev=11, st_nlink=1, st_uid=500, st_gid=5, st_size=0, st_atime=1323629303, st_mtime=1323629303, st_ctime=1323628616) I think the best solution is to call sendfile() and catch OSError, then fallback on the generic loop. However, you must also guard against fileno() failing: >>> io.BytesIO().fileno()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
io.UnsupportedOperation: fileno |
Here's the result of a benchmark sending a 1GB file over a Gb/s ethernet network: sendfile: The total time doesn't vary much (the reduction above is just jitter). Note that is changed Giampaolo's patch to call sendfile on the whole file, not by block. |
That's not compatible across POSIX platforms. |
What do you mean ? |
Ah ok, I misinterpreted what you wrote then. |
But then you make much more syscalls than what's necessary, and your Anyway, here are the numbers, do you think it's interesting to merge |
Specifying a big blocksize doesn't mean the transfer will be faster. Another thing I don't like is that by doing so you implicitly assume that the file is "fstat-eable". I don't know if there are cases where it's not, but the less assumptions we do the better. Note: I'm sure that for both send() and sendfile() blocksize>=65536 is faster than blocksize=8192 (the current default) so it probably makes sense to change that (I'll file a separate issue). |
The transfer won't be faster mainly because it's really I/O bound.
I can perfectly believe this for a send loop, maybe because you're
Well, the file must be mmap-able, so I doubt fstat() is the biggest concern... |
Have you actually measured this? |
""" block-sendfile over Gb/s: full-sendfile over Gb/s: As you can see, the throughput doesn't vary (the difference in "real """ block-sendfile over loopback: full-sendfile over loopback: Same thing for loopback, except that here, zero-copy makes a I don't have access to a 10Gb/s network, but basic math hints that |
It seems you're right, sorry. We need to take that into account then. In the meantime I rewrote the original patch and got rid of the "use_sendfile" explicit argument in order to attempt to use sendfile() by default and fall back on using send() if bytes sent were 0. TSL_FTP related changes are still left out for now as I'm planning a little refactoring first. |
""" I don't get it, why do you use select? |
It's necessary because sendfile() can fail with EAGAIN. As for your "blocksize = filesize" argument I changed my opinion: despite being less CPU consuming we might incur into problems if that number is too big. 'count' parameter on Linux, for example, is expected to be an unsigned int. |
It can fail with EAGAIN if the input FD is non-blocking, exactly like
'count' is size_t, like for mmap() and any other function accepting a |
If you really think a blocksize is necessary, you could choose a much |
Then why 'offset' and 'count' parameters have a different data type? Linux: Solaris: HP-UX:
You can't send a complete file at once in the first place unless it's very small.
It will. Try it yourself.
EAGAIN is caused by the socket fd not being ready yet, not the file fd. |
Because offsets can be negative (e.g. for lseek), while a size can't.
I didn't see the socket could be set to non-blocking. In that case, there's a problem with the patch, since select can block Also, apparently socket.sendall() doesn't retry on EAGAIN, it doesn't
I'm leaving this topic, you can do as you like... |
On Linux (and presumably on all POSIX platforms) passing a negative offset results in EINVAL.
Right. I will fix that.
socket.sendall() is not supposed to return EAGAIN in the first place. And again, I don't see how this is related with the issue at hand.
Bye. |
A much larger patch which should address all issues is in attachment.
|
I don't understand why you must put the socket in non-blocking mode for sendfile(). |
I did that mainly because I'm using select() / poll() and it seems kind of "natural" to set the socket in non-blocking mode (proftpd does the same).
Agreed and turned them into methods.
Perhaps. The only thing which is not clear is how to deal with blocking vs. non-blocking sockets. |
But why do you need to use select() / poll() ? Can't you just call |
Because otherwise sendfile() fails with EAGAIN many times before sending any actual data. |
EAGAIN on a blocking fd? Is it documented somewhere?
write would block. FreeBSD apparently says something similar:
|
I see. Well, what I'm experiencing right now if I remove the select() / poll() call is a sequence of EAGAIN errors alternated by successful sendfile() calls. |
Weird. I guess it would be nice to dig a bit more :-) |
After digging a bit further it seems EAGAIN occurs in case a timeout was previously set against the socket as in ftplib.FTP(..., timeout=2) (at least on Linux, FWICT). As such, we can debate whether avoid using select/poll if timeout was not set. Other than that, the patch is reasonably ok to me and can be committed as-is and blocksize argument tuning can be discussed in a separate ticket. |
Ah, indeed. That's because socket timeout makes the underlying fd non-blocking. |
I created bpo-17552 in order to discuss about socket.sendfile() addition. |
Updated patch which uses the newly added socket.sendfile() method (bpo-17552). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: