Issue17552
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013-03-26 19:27 by giampaolo.rodola, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
socket-sendfile1.patch | giampaolo.rodola, 2013-03-26 19:27 | initial draft | review | |
socket-sendfile2.patch | giampaolo.rodola, 2013-04-08 15:21 | update file offset add 'offset' parameter | review | |
socket-sendfile3.patch | giampaolo.rodola, 2014-04-20 18:56 | docs, selectors module, always seek() file, use memoryview(), ssl tests | review | |
bench.py | giampaolo.rodola, 2014-04-21 20:33 | |||
socket-sendfile4.patch | giampaolo.rodola, 2014-04-22 02:13 | return no. of bytes sent instead of (ok, exc); "sleep" on BlockingIOError | review | |
socket-sendfile5.patch | giampaolo.rodola, 2014-04-22 17:49 | fix some issues on windows | review | |
socket-sendfile6.patch | giampaolo.rodola, 2014-04-25 17:45 | get rid of "blocksize" and "use_fallback" args, add "count" arg, use fstat() | review | |
socket-sendfile7.patch | giampaolo.rodola, 2014-06-07 10:22 | adds send_blocksize argument | review |
Messages (35) | |||
---|---|---|---|
msg185292 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2013-03-26 19:27 | |
This is based on progress made in issue 13564 and aims to provide a new sendfile() method for the socket class taking advantage of high-performance "zero-copy" os.sendfile() available on most POSIX platforms. A wrapper on top of os.sendfile() appears to be convenient because getting everything right is a bit tricky. Also we can avoid code duplication in case we want to add sendfile() support to ftplib (issue 13564) and httplib. === THE API === Attached is a draft patch proposing an API which: - uses os.sendfile() whenever available and usable (a mmap-like file is required) - if not falls back on using plain socket.sendall() - returns a tuple of two elements which indicates whether sendfile() was used and an exception instance in case it wasn't on account of an internal error, if any - does not support non-blocking sockets (will raise ValueError) === ALTERNATE API === 1) In case the transfer is interrupted halfway because of an error the user has no clue on how many bytes were sent (and file.tell() won't tell =)). As such it probably makes sense to provide a custom socket.SendfileError exception encapsulating the original exception and the bytes sent as arguments. 2) For the same reason the returned tuple should probably include the number of bytes transmitted. === WINDOWS SUPPORT === Further development may include Windows support by using TransmitFile (http://msdn.microsoft.com/en-us/library/windows/desktop/ms740565(v=vs.85).aspx). Once we agree on an API I might start looking into Windows code (which appears to be quite tricky btw). Any feedback is very welcome. |
|||
msg185294 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2013-03-26 19:43 | |
A couple of comments: - I don't understand the point of the second member in the tuple - the timeout logic should be fixed so that the total operation time doesn't exceed the timeout, rather than each iteration (in other words, a deadline should be computed at the start and then the poll / select calls be issued in accordance with that deadline) - non-blocking sockets could be supported by returning partial writes, and letting the caller pass an offset argument - as a vocabulary nit, I don't really understand what "mmap-like file" means |
|||
msg185295 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2013-03-26 19:53 | |
- I don't understand the "running out of FDs" thing. select() is limited to FDs less than FD_SETSIZE, but I don't see how you could get EMFILE (select() will return a ValueError) - is there any platform with sendfile() which doesn't support poll()? - I still don't see the point of calling sendfile() with a block size. Everybody just uses st_size, and that's how it's supposed to be done. |
|||
msg185303 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2013-03-26 21:05 | |
> I don't understand the point of the second member in the tuple The 'exception' member can be useful to know the reason why sendfile() failed and send() was used as fallback. > the timeout logic should be fixed so that the total operation > time doesn't exceed the timeout, rather than each iteration Do you mean that if the user sets a timeout=2 and the whole transfer takes longer than that you expect a timeout exception? That's pretty unusual (e.g. ftplib does not behave like that). > non-blocking sockets could be supported by returning partial writes, > and letting the caller pass an offset argument I see little value in supporting non-blocking sockets because someone willing to do that usually wants to use sendfile() directly (no wrappers involved) as they are forced to handle errors and take count of transmitted bytes in their own code anyway. Here's an example of how sendfile is supposed to be used with non-blocking sockets: https://code.google.com/p/pyftpdlib/source/browse/tags/release-0.7.0/pyftpdlib/ftpserver.py#1035 An extra layer such as the one proposed in my patch is completely useless and also slow considering the amount of instructions before the actual sendfile() call. It is however a good idea to provide an 'offset' argument. > as a vocabulary nit, I don't really understand what "mmap-like file" means Will turn that into "regular files". > I don't see how you could get EMFILE (select() will return a ValueError) Right. Will fix that. > is there any platform with sendfile() which doesn't support poll()? No idea. I made a quick tour on snakebite and apparently all of the provided platforms have poll(). > I still don't see the point of calling sendfile() with a block size I will make some actual benchmarks and fix that later if makes sense. Assuming providing a high blocksize (say 1MB) makes no difference in terms of CPU usage it's better to keep that for consistency with sendall() in case we fallback on using it. |
|||
msg186305 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2013-04-08 15:14 | |
New patch in attachment includes a new 'offset' parameter, new tests and also update file offset on return or in case of error so that file.tell() can be used to tell how many bytes were transmitted at any time. This way we'll avoid using a custom exception. In summary, the API looks like this. Transfer ok: >>> file = open('somefile', 'rb') >>> s = socket.socket() >>> sock.sendfile(file) (True, None) >>> file.tell() 20000000 >>> ...and in case sendfile() could not be used internally because file was not a regular file: >>> file = io.BytesIO(b'x' * 1*1024*1024) >>> sock.sendfile(file) (False, UnsupportedOperation('fileno',)) >>> file.tell() 20000000 >>> I still haven't looked into TransmitFile on Windows as I have to figure out how to compile Python 3.4 on Windows. |
|||
msg216915 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-04-20 18:56 | |
New patch in attachment. Changes: - docs - replaced select() / poll() with the new selectors module - file position is always updated both on return and on error; this means file.tell() is the designated way to know how many bytes were sent - replaced sendall() with send() so that we can count the number of bytes transmitted (related and rejected proposal: https://mail.python.org/pipermail/python-ideas/2014-April/027689.html) - send() now uses memoryview() for better performances to re-transmit data which was not sent by the first send() call - pre-emptively raise exception if file is not opened in binary mode - tests for ssl module I've tried to work on Windows TransmitFile support but I got stuck as I'm not sure how to convert a file object into a HANDLE in C. I suppose Windows support can also be added later as a separate ticket and in the meantime I'd like to push this forward. Open questions: - Is the current return value desirable (do we really care if os.sendfile() was used internally?)? Should the returned tuple also include the number transmitted bytes? - default blocksize: Charles-François was suggesting to remove the blocksize argument; FWIW I've made some quick benchmarks by using "time" cmdline utility with different blocksizes and I didn't notice substantial difference. I still think a blocksize parameter is necessary in case we fallback on using send() and also for consistency with ftplib's storbinary() method which will be involved later (issue 13564). |
|||
msg216968 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-04-21 20:33 | |
Attached is a simple benchmark script transmitting a 100MB file. On my Linux box sendfile() is almost twice as fast as send(): send() real 0.0613s user 0.0100s sys 0.0900s total 0.1000s sendfile() real 0.0318s user 0.0000s sys 0.0500s total 0.0500s |
|||
msg216973 - (view) | Author: Josh Rosenberg (josh.r) * ![]() |
Date: 2014-04-21 22:46 | |
For TransmitFile support, the Windows function to turn an integer file descriptor into a WinAPI file HANDLE should be _get_osfhandle: http://msdn.microsoft.com/en-us/library/ks2530z6.aspx |
|||
msg216975 - (view) | Author: Akira Li (akira) * | Date: 2014-04-21 23:34 | |
Should socket.sendfile() always return number of bytes sent because file.tell() may be changed by something else that uses the same file descriptor? What happens if the file grows? Instead of returning `(was_os_sendfile_used, os_sendfile_error)`, you could specify `no_fallback=False` that could be set to `True` to assert that the fallback is not used (with `no_fallback=True` the `os_sendfile_error` is raised instead of using socket.send() as a fallback). If possible; always include number of bytes sent in any error that is raised. |
|||
msg216979 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-04-22 02:13 | |
> Instead of returning [...] you could specify `no_fallback=False` that > could be set to `True` to assert that the fallback is not used > [...] and return the number of bytes sent. Good idea, thanks, that is much better indeed. Updated patch is in attachment. > [...] file.tell() may be changed by something else that uses > the same file descriptor? What happens if the file grows? I would say that is a use case we should explicitly not support as it probably implies you're doing something you're not supposed to. > If possible always include the number of bytes sent in any error that is raised. That's similar to my recent (rejected) proposal for socket.sendall(): https://mail.python.org/pipermail/python-ideas/2014-April/027689.html IMO the patch as it stands is fine as you can determine the number of bytes which were sent either by using the function return value or file.tell() (in case of error). Also, updating the file offset on exit makes the sendfile() implementation behave exactly like send(). |
|||
msg217015 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-04-22 17:49 | |
Yet another patch fixing some problems on Windows. Hopefully this should be the last one as for what concerns the POSIX systems. As such I would kindly ask for a review and/or further suggestions. |
|||
msg217058 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2014-04-23 07:44 | |
1) I really don't like the use_fallback argument: as a user, I don't care if it's using sendfile/splice/whatever WIndows uses. I view this as a channel transfer (like Java's http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#transferFrom(java.nio.channels.ReadableByteChannel, long, long)), which moves bytes around from one FD to another. If the user want precise control, he can just go ahead and call the syscall itself. Apart from complicating the prototype, what do this bring? 2) Just returning the number of bytes sent is fine 3) I really don't like the blocksize argument. Just use a really large value internally to minimize the number of syscalls, that's all that matters. I've *never* seen code which explicitly uses a blocksize: in 99% of cases, it just uses stat to find out the file size, and call sendfile with it. Trying to be consistent with ftplib is IMO a bad idea, since the API is just completely broken (I mean, just sending/retrieving a file is complex). A useful parameter instead would be to support sending only part of the file, so adding a count argument. You can have a look at http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#transferFrom(java.nio.channels.ReadableByteChannel, long, long) for an example many people bash Java, but they've designed some great APIs :-) |
|||
msg217080 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-04-23 17:26 | |
> 1) I really don't like the use_fallback argument > Apart from complicating the prototype, what do this bring? My initial thought was that the user might want to know *why* a file cannot be sent by using the fastest method and hence wants to see the original exception. Anyway, I have not strong opinions about this so I guess we can also drop it. > A useful parameter instead would be to support sending only part of the file, > so adding a count argument. Have you read my patch? This is already provided by the "offset" parameter. > I really don't like the blocksize argument. > I've *never* seen code which explicitly uses a blocksize Both sendfile() and TransmitFile provide a "blocksize" parameter for very good reasons therefore it seems natural that an API built on top of them exposes the same parameter as well. Some examples in the stdlib which comes to mind using a blocksize are asynchat.async_chat.ac_out_buffer_size and ftplib.FTP.storbinary and I'm sure if you grep for "blocksize" you'll find others. Providing a blocksize is also necessary to tell how many bytes to read from file in case send() is used, 'cause it's *crucial* that you don't read the whole file into memory. I will also give a real world example: if your app wants to limit the transfer speed you will want to explicitly transmit smaller chunks of data and then "sleep", and the only way to do that is by manipulating the blocksize. So yes: a blocksize parameter *is* necessary, so please stop beating that horse. As for using a bigger value: I made some benchmarks by using different sizes and I didn't notice any noticeable difference. |
|||
msg217081 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-04-23 17:53 | |
Note: my example about limiting the transfer speed does not really apply 'cause as this stands right now it cannot be used with non-blocking sockets. Other arguments do though and I hope it's clear that we need "blocksize". |
|||
msg217082 - (view) | Author: Akira Li (akira) * | Date: 2014-04-23 18:21 | |
> I really don't like the use_fallback argument .. I initially also thought so. But I've suggested the parameter to replace `(was_os_sendfile_used, os_sendfile_error)` returned value as a *trade off* between a slight complexity in the interface vs. allowing to detect performance bugs easily e.g., when someone passed a file-like object incompatible with os.sendfile by accident (it is not enough to have a valid fileno. What mmap-like means?). > .. as a user, I don't care if it's using sendfile/splice/whatever WIndows uses. > .. what do this bring? The reason sendfile exists is performance. Otherwise socket.makefile and shutil.copyfileobj could be used instead. use_fallback parameter provides a way to assert that an ineffective fallback is not used by accident. It may be ignored by most users. An alternative is a new separate public method that doesn't use the fallback. |
|||
msg217090 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-04-23 21:44 | |
Considering the current indecision about certain design aspects I started a discussion on python-ideas: https://mail.python.org/pipermail/python-ideas/2014-April/027752.html |
|||
msg217099 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2014-04-23 22:47 | |
Can you also think about how this would be wrapped in asyncio? |
|||
msg217104 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-04-23 23:48 | |
I think asyncio would be better off using os.sendfile() / TransmitFile directly, in fact the current patch explicitly does not support non-blocking sockets (I couldn't see any sane approach to do that). Here's an example of how os.sendfile() should be used in an async manner: https://code.google.com/p/pyftpdlib/source/browse/tags/release-0.7.0/pyftpdlib/ftpserver.py#1035 asyncio should be doing something very similar to that and do the necessary stuff so that you can "yield from transport.sendfile(file)". Actually I can see what I can do in that regard after I'm finished with this. |
|||
msg217120 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2014-04-24 07:02 | |
>> A useful parameter instead would be to support sending only part of the file, >> so adding a count argument. > > Have you read my patch? This is already provided by the "offset" parameter. Of course I read your patch ;-) I mean I'd like a parameter for the offset, and another one for the number of bytes to send, like in Java's version (and sendfile(), see below). >> I really don't like the blocksize argument. >> I've *never* seen code which explicitly uses a blocksize > > Both sendfile() and TransmitFile provide a "blocksize" parameter for very good reasons therefore it seems natural that an API built on top of them exposes the same parameter as well. No, they expose a *count* parameter: http://linux.die.net/man/2/sendfile """ ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count); count is the number of bytes to copy between the file descriptors. """ You're mixing up blocksize, which is the maximum number of bytes to transfer in one syscall and only makes sense in the context of repeated syscalls, and count, which is the total amount of data you want the function to transfer. No sensible sendfile-like API exposes a maximum "blocksize" to send at once, since the goal is to limit copies and system calls: you just pass a source and destination FD, a starting offset, and a number of bytes to transfer, and the syscall takes care of the rest. Here, you basically implement sendall() on top of sendfile() (in pseudo-code, using a buffer instead of a file and not taking into account short writes but the idea if the same): while <remaining data to send>: socket.sendfile(data[offset:offset+chunksize) The way it's supposed to be used is simply: socket.sendfile(data[offset:offset+count]) That's how everyone one uses sendfile(), that's how Java exposes it, and that's IMO how it should be exposed. To sum up, I think there's a fundamental confusion between blocksize and count in this API: that's what I've been saying since the beginning of this thread: if you disagree, that's OK, I just want to make sure we're talking about the same thing ;-) |
|||
msg217121 - (view) | Author: Akira Li (akira) * | Date: 2014-04-24 11:05 | |
use_fallback parameter is mostly a debugging tool. If it helps to avoid the indecision; I would side with neologix' remarks and also suggest to drop the use_fallback parameter. It seems the patch assumes *offset == nbytes_sent* that is false in general e.g., if offset > 0 at the start of the function. :: _SEND_BLOCKSIZE = 262144 # ??? def sendfile(self, file, offset=None, nbytes=None, *, nbytes_per_send=_SEND_BLOCKSIZE) -> nbytes_sent: """ Send *nbytes* bytes from regular *file* starting at *offset* position. Return the number of bytes sent. If *offset* is ``None``; start at the current file position. If *nbytes* is ``None``; send file until EOF is reached. The socket should be connection-oriented e.g., SOCK_STREAM *nbytes_per_send* is used by a *send()*-based fallback code. *os.sendfile()* ignores it. - if socket is blocking (timeout is None) then it may keep trying to send data until an error occurs. Even on success it may return less than *nbytes* if there is not enough data available in *file*. - if socket is non-blocking and timeout == 0 then fail if even a single byte can't be sent immediately - if socket has timeout > 0 then raise the timeout error if more than *timeout* seconds pass since the call is started and nothing is sent i.e., use a single deadline for all system calls (like *socket.send()*). If timeout is not None then *socket.sendfile()* may send less bytes than specified. *file* position after the call is unspecified. """ # pseudo-code total = 0 if offset is None offset = file.tell() if nbytes is None: nbytes = os.path.getsize(file.name) interval = self.timeout if interval is not None: deadline = now() + interval while select([], [self], [], interval)[1]: # writable try: sent = os.sendfile(self, file, offset, nbytes) except BlockingIOError as e: assert getattr(e, 'characters_written', 0) == 0 if interval is not None: # update interval interval = deadline - now() if interval < 0: break continue # ignore else: if sent == 0: return total total += sent offset += sent nbytes -= sent if nbytes == 0: return total if interval is not None: # update interval interval = deadline - now() if interval < 0: break # timeout if total == 0: raise TimeoutError else: return total |
|||
msg217128 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-04-24 15:56 | |
> [...] I'd like a parameter for the offset, and another one for the > number of bytes to send. > To sum up, I think there's a fundamental confusion between blocksize > and count in this API. Ah OK, I see what you mean now. If seems we didn't understand each other. =) And yes, I suppose you're right: if possible we should pass a high value and let sendfile() do its thing. Note that we still have to implement an internal loop ourselves though because if the socket has a timeout sendfile() will return before EOF (I've checked this just now). As for what to do, here's what I propose: - we provide a blocksize argument defaulting to None - in case of send() and blocksize == None we set it to 262144 - in case of sendfile() we set it to a very high value (4M or something) - using os.path.getsize(file.name) looks risky to me as the user might have changed CWD in the meantime or something I'm -1 about adding "count" *and* "blocksize" parameters. "blocksize" alone is good enough IMO and considering what I've just described it is a better name than "count". |
|||
msg217143 - (view) | Author: Akira Li (akira) * | Date: 2014-04-25 00:59 | |
count and blocksize are completely different. *count* specifies how many bytes at most socket.sendfile should sent overall. It may change the result i.e., it may not be necessary that the file is read until EOF. It has the same meaning as *nbytes* parameter for os.sendfile or *nbytes* in msg217121 *blocksize* doesn't change how many bytes is read in the end by socket.sendfile. At most it may affect time performance. It is *nbytes_per_send* in msg217121 |
|||
msg217144 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2014-04-25 01:10 | |
> I'm -1 about adding "count" *and* "blocksize" parameters. "blocksize" > alone is good enough IMO and considering what I've just described it > is a better name than "count". I'm confused. Why is "blocksize" necessary at all? > using os.path.getsize(file.name) looks risky to me Why not fstat(fd) ? |
|||
msg217156 - (view) | Author: Akira Li (akira) * | Date: 2014-04-25 08:40 | |
> I'm confused. Why is "blocksize" necessary at all? My guess, it may be used to implement socket.send()-based fallback. Its meaning could be the same as *length* parameter in shutil.copyfileobj The fallback is useful if os.sendfile doesn't exists or it doesn't accept given parameters e.g., if *file* is not mmap-like enough for os.sendfile. > > using os.path.getsize(file.name) looks risky to me > Why not fstat(fd) ? os.path.getsize(file.name) in msg217121 is a pseudo-code (as said in the comment) that expresses the intent that if *nbytes* parameter is not specified (None) then socket.sendfile should send bytes from the file until EOF is reached. In real code, if *nbytes is None*; I would just call os.sendfile repeatedly with a large constant *nbytes* parameter until os.sendfile returns 0 (meaning EOF) without asking the file size explicitly It assumes socket.sendfile doesn't specify its behaviour if the file size changes between the calls. The pseudo-code in msg217121 is my opinion about the public interface for socket.sendfile -- It is different from the one in the current socket-sendfile5.patch |
|||
msg217165 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-04-25 17:45 | |
Given the opinions expressed so far I: - got rid of the "blocksize" parameter - got rid of the "use_fallback" parameter - added a "count" parameter - used os.fstat() to figure out the total file size and passed it directly to sendfile() I'm attaching socket-sendfile6.patch which includes docs and many new tests. Hopefully this should be the final one (yet to review though). |
|||
msg219926 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-06-07 10:22 | |
Looking back at this I think a "send_blocksize" argument is necessary after all. shutil.copyfileobj() has it, so is ftplib.FTP.storbinary() and httplib (issue 13559) which will both be using socket.sendfile() once it gets included. Updated patch is in attachment. If there are no other objections I'd be for committing this next week or something as I'm pretty confident with the current implementation. |
|||
msg219927 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2014-06-07 11:10 | |
> Looking back at this I think a "send_blocksize" argument is necessary after all. shutil.copyfileobj() has it, so is ftplib.FTP.storbinary() and httplib (issue 13559) which will both be using socket.sendfile() once it gets included. Those APIs are really poor, please don't cripple sendfile() to mirror them. Once again, a send_blocksize argument doesn't make sense, you won't find it anywhere else. All that's needed is start offset and a size/length argument. |
|||
msg219928 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-06-07 11:22 | |
I agree it is not necessary for sendfile() (you were right). Do not introducing it for send(), though, poses some questions. For instance, should we deprecate or ignore 'blocksize' argument in ftplib as well? Generally speaking, when using send() there are circumstances where you might want to adjust the number of bytes to read from the file, for instance: - 1: set a small blocksize (say 512 bytes) on systems where you have a limited amount of memory - 2: set a big blocksize (say 256000) in order to speed up the transfer / use less CPU cycles; on very fast networks (e.g. LANs) this may result in a considerable speedup (I know 'cause I conducted these kind of tests in pyftpdlib: https://code.google.com/p/pyftpdlib/issues/detail?id=94). |
|||
msg219929 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-06-07 11:25 | |
...speaking of which, now that I look back at those benchmarks it looks like 65536 bytes is the best compromise (in my latest patch I used 16348). |
|||
msg220173 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-06-10 16:53 | |
Charles, Antoine, any final thought about this given the reasons I stated above? If you're still -1 about adding 'send_blocksize' argument I guess I can get rid of it and perhaps reintroduce it later if we see there's demand for it. |
|||
msg220191 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2014-06-10 19:00 | |
> I agree it is not necessary for sendfile() (you were right). Good we agree :-) > Do not introducing it for send(), though, poses some questions. > For instance, should we deprecate or ignore 'blocksize' argument in ftplib as well? Honestly, we should deprecate the whole ftplib module :-) More seriously, it's really low-level, I don't understand the point of this whole callback-based API: FTP.storbinary(command, file[, blocksize, callback, rest])Why not simply a: FTP.store(source, target, binary=True) If you have time and it interest you, trying to improve this module would be great :-) > Generally speaking, when using send() there are circumstances where you might want to adjust the number of bytes to read from the file, for instance: > > - 1: set a small blocksize (say 512 bytes) on systems where you have a limited amount of memory > > - 2: set a big blocksize (say 256000) in order to speed up the transfer / use less CPU cycles; on very fast networks (e.g. LANs) this may result in a considerable speedup (I know 'cause I conducted these kind of tests in pyftpdlib: https://code.google.com/p/pyftpdlib/issues/detail?id=94). I agree, but both points are addressed by sendfile(): internally, the kernel will use whatever block size it likes, depending on the source file type, page size, target NIC ring buffer size. So to reply to your above question, I wouldn't feel too bad about simply ignoring the blocksize argument in e.g. shutil.copyfile(). For ftplib, it's a little harder since we're supposed to support an optional callback, but calling a callback for every block will drive performance down... So I'd really like it if you could push the version without the blocksize, and then we'll see (and hopefully fix the non-sensical libraries that depend on it). |
|||
msg220195 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-06-10 20:41 | |
> I agree, but both points are addressed by sendfile() I'm talking about send(), not sendfile(). Please remember that send() will be used as the default on Windows or when non-regular files are passed to the function. My argument is about introducing an argument to use specifically with send(), not sendfile(). In summary: sendfile(self, file, offset=0, count=None, send_blocksize=16384): """ [...] If os.sendfile() is not available (e.g. Windows) or file is not a regular file socket.send() will be used instead. [...] *send_blocksize* is the maximum number of bytes to transmit at one time in case socket.send() is used. """ > Honestly, we should deprecate the whole ftplib module :-) > More seriously, it's really low-level, I don't understand the point of > this whole callback-based API: > FTP.storbinary(command, file[, blocksize, callback, rest]) > Why not simply a: > FTP.store(source, target, binary=True) ftplib module API may be a bit rusty but there's a reason why it was designed like that. 'callback' and 'blocksize' arguments can be used to implement progress bars, in-place transformations of the source file's data and bandwidth throttling (by having your callback 'sleep' if more than N bytes were sent in the last second). 'rest' argument is necessary for resuming uploads. I'm not saying ftplib API cannot be improved: maybe we can provide two higher level "put" and "get" methods but please let's discuss that into another thread. |
|||
msg220202 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2014-06-10 22:12 | |
>> I agree, but both points are addressed by sendfile() > > I'm talking about send(), not sendfile(). > Please remember that send() will be used as the default on Windows or when non-regular files are passed to the function. My argument is about introducing an argument to use specifically with send(), not sendfile(). Which makes even less sense if it's not needed for sendfile() :-) I really don't see why we're worrying so much about allocating 8K or 16K buffers, it's really ridiculous. For example, buffered file I/O uses a default block size of 8K. We're not targeting embedded systems. |
|||
msg220220 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * ![]() |
Date: 2014-06-11 01:14 | |
OK then, I'll trust your judgement. I'll use 8K as the default and will commit the patch soon. |
|||
msg220226 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2014-06-11 01:54 | |
New changeset 001895c39fea by Giampaolo Rodola' in branch 'default': fix issue #17552: add socket.sendfile() method allowing to send a file over a socket by using high-performance os.sendfile() on UNIX. Patch by Giampaolo Rodola'· http://hg.python.org/cpython/rev/001895c39fea |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:43 | admin | set | github: 61752 |
2014-06-11 02:04:53 | giampaolo.rodola | set | keywords:
+ 3.2regression, - patch, needs review status: open -> closed resolution: fixed |
2014-06-11 01:54:43 | python-dev | set | nosy:
+ python-dev messages: + msg220226 |
2014-06-11 01:14:07 | giampaolo.rodola | set | assignee: giampaolo.rodola messages: + msg220220 |
2014-06-10 22:12:08 | neologix | set | messages: + msg220202 |
2014-06-10 20:41:23 | giampaolo.rodola | set | messages: + msg220195 |
2014-06-10 19:00:14 | neologix | set | messages: + msg220191 |
2014-06-10 16:53:00 | giampaolo.rodola | set | messages: + msg220173 |
2014-06-07 11:25:32 | giampaolo.rodola | set | messages: + msg219929 |
2014-06-07 11:22:54 | giampaolo.rodola | set | messages: + msg219928 |
2014-06-07 11:10:36 | neologix | set | messages: + msg219927 |
2014-06-07 10:22:11 | giampaolo.rodola | set | files:
+ socket-sendfile7.patch messages: + msg219926 |
2014-04-28 10:08:25 | giampaolo.rodola | link | issue13559 dependencies |
2014-04-27 00:31:49 | vstinner | set | title: socket.sendfile() -> Add a new socket.sendfile() method |
2014-04-25 17:45:56 | giampaolo.rodola | set | files:
+ socket-sendfile6.patch messages: + msg217165 |
2014-04-25 08:40:55 | akira | set | messages: + msg217156 |
2014-04-25 01:10:54 | pitrou | set | messages: + msg217144 |
2014-04-25 00:59:10 | akira | set | messages: + msg217143 |
2014-04-24 15:56:50 | giampaolo.rodola | set | messages: + msg217128 |
2014-04-24 11:05:12 | akira | set | messages: + msg217121 |
2014-04-24 07:02:02 | neologix | set | messages: + msg217120 |
2014-04-23 23:48:49 | giampaolo.rodola | set | messages: + msg217104 |
2014-04-23 22:47:28 | gvanrossum | set | nosy:
+ gvanrossum messages: + msg217099 |
2014-04-23 21:46:29 | yselivanov | set | nosy:
+ yselivanov |
2014-04-23 21:44:13 | giampaolo.rodola | set | messages: + msg217090 |
2014-04-23 18:21:35 | akira | set | messages: + msg217082 |
2014-04-23 17:53:08 | giampaolo.rodola | set | messages: + msg217081 |
2014-04-23 17:26:38 | giampaolo.rodola | set | messages: + msg217080 |
2014-04-23 07:44:46 | neologix | set | messages: + msg217058 |
2014-04-22 17:49:45 | giampaolo.rodola | set | files:
+ socket-sendfile5.patch messages: + msg217015 |
2014-04-22 02:19:46 | giampaolo.rodola | set | nosy:
+ josiah.carlson |
2014-04-22 02:13:21 | giampaolo.rodola | set | files:
+ socket-sendfile4.patch messages: + msg216979 |
2014-04-21 23:34:51 | akira | set | nosy:
+ akira messages: + msg216975 |
2014-04-21 22:46:33 | josh.r | set | nosy:
+ josh.r messages: + msg216973 |
2014-04-21 20:33:32 | giampaolo.rodola | set | files:
+ bench.py messages: + msg216968 |
2014-04-20 18:57:38 | giampaolo.rodola | set | versions: + Python 3.5, - Python 3.4 |
2014-04-20 18:57:00 | giampaolo.rodola | set | files:
+ socket-sendfile3.patch messages: + msg216915 |
2013-04-08 15:21:38 | giampaolo.rodola | set | files: + socket-sendfile2.patch |
2013-04-08 15:21:23 | giampaolo.rodola | set | files: - socket-sendfile2.patch |
2013-04-08 15:15:13 | giampaolo.rodola | set | files: + socket-sendfile2.patch |
2013-04-08 15:14:11 | giampaolo.rodola | set | messages: + msg186305 |
2013-04-08 14:57:11 | giampaolo.rodola | set | title: create_server -> socket.sendfile() |
2013-04-08 14:56:45 | giampaolo.rodola | set | title: socket.sendfile() -> create_server |
2013-04-07 13:31:33 | asvetlov | set | nosy:
+ asvetlov |
2013-03-27 07:42:04 | rosslagerwall | set | nosy:
+ rosslagerwall |
2013-03-26 21:05:31 | giampaolo.rodola | set | messages: + msg185303 |
2013-03-26 20:13:47 | christian.heimes | set | nosy:
+ christian.heimes |
2013-03-26 19:53:29 | neologix | set | messages: + msg185295 |
2013-03-26 19:43:24 | pitrou | set | nosy:
+ pitrou, neologix messages: + msg185294 |
2013-03-26 19:39:46 | giampaolo.rodola | link | issue13564 dependencies |
2013-03-26 19:27:53 | giampaolo.rodola | create |