Issue 24291: Many servers (wsgiref, http.server, etc) can truncate large output blobs

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/68479

classification

Title:	Many servers (wsgiref, http.server, etc) can truncate large output blobs
Type:	behavior	Stage:	resolved
Components:	IO, Library (Lib)	Versions:	Python 3.5

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:		Nosy List:	Jonathan Kamens, Paolo Veglia, benjamin.peterson, jmadden, marcjofre, martin.panter, pitrou, pje, python-dev, serhiy.storchaka, stutzbach, tarek
Priority:	normal	Keywords:	patch

Created on 2015-05-26 20:18 by Jonathan Kamens, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
wsgi-partial.py	martin.panter, 2016-04-09 10:52	Demo
wfile-partial.patch	martin.panter, 2016-04-09 12:22		review
wfile-partial.v2.patch	martin.panter, 2016-04-11 03:46		review
wfile-partial.v3.patch	martin.panter, 2016-05-16 11:29		review
wsgiref-only.patch	martin.panter, 2016-05-31 10:59	wsgiref fix only	review
wsgiref-only.v2.patch	martin.panter, 2016-06-03 12:59		review
wfile-partial.v4.patch	martin.panter, 2016-06-05 10:48		review
wfile-partial.v5.patch	martin.panter, 2016-07-23 08:40		review

Messages (30)
msg244131 - (view)	Author: Jonathan Kamens (Jonathan Kamens)	Date: 2015-05-26 20:18
The _write method of wsgiref.handlers.SimpleHandler reads as follows: def _write(self,data): self.stdout.write(data) The problem here is that calling write() on a socket is not actually guaranteed to write all of the data in the buffer. If the length of data is large enough, then the kernel will take only part of it. In that case, the rest of the response data will be silently discarded by wsgiref. _write needs to check the return value of self.stdout.write(data), and if it is less than the length of data, repeat the write from the middle of the data buffer, etc., until all the data has been written.
msg244138 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-05-26 22:01
[UTF-8 error workaround] What kind of object is “stdout”? Plain Python socket objects don’t have a write() method. Perhaps “stdout” is meant to implement the BufferedIOBase.write() interface, which guarantees everything is written, even if it takes multiple raw calls. Wrapping your output stream in BufferedWriter might be a solution.
msg244489 - (view)	Author: Marc Jofre (marcjofre)	Date: 2015-05-30 20:19
When executing, wsgiref produces the following error dump: D:\SixSensoCytometerWebBased\include\bin>D:\SixSensoCytometerWebBased\include\bi n\manage.exe runserver 8765 Validating models... 0 errors found May 26, 2015 - 16:20:56 Django version 1.5.8, using settings 'settings' Development server is running at http://127.0.0.1:8765/ Quit the server with CTRL-BREAK. [26/May/2015 16:20:56] "GET / HTTP/1.1" 200 29661 100 [26/May/2015 16:20:59] "POST /login HTTP/1.1" 200 25980 [26/May/2015 16:20:59] "GET /static/js/ready.js HTTP/1.1" 500 79563 Traceback (most recent call last): File "D:\WinPython-64bit-2.7.6.3\python-2.7.6.amd64\Lib\wsgiref\handlers.py", line 86, in run self.finish_response() File "D:\WinPython-64bit-2.7.6.3\python-2.7.6.amd64\Lib\wsgiref\handlers.py", line 128, in finish_response self.write(data) File "D:\WinPython-64bit-2.7.6.3\python-2.7.6.amd64\Lib\wsgiref\handlers.py", line 217, in write self._write(data) File "D:\WinPython-64bit-2.7.6.3\python-2.7.6.amd64\lib\socket.py", line 324, in write [26/May/2015 16:20:59] "GET /static/js/ClasePrincipalAutobahn.js HTTP/1.1" 500 7 9818 [26/May/2015 16:20:59] "GET /static/img/initialGUIimage.png HTTP/1.1" 500 79777 self.flush() File "D:\WinPython-64bit-2.7.6.3\python-2.7.6.amd64\lib\socket.py", line 303, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) error: [Errno 10053] An established connection was aborted by the software in yo ur host machine
msg244505 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-05-31 00:40
Marc, if you think your connection aborted error is related, can you say what kind of object the “stdout” parameter is in the handler class? In other words, what class’s method is being called in the write() function at line 217?
msg244506 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-05-31 00:48
Actually, looking closer at Marc’s traceback, it appears to be the ordinary socket._fileobject class. So I suspect it is unrelated to this bug because the Python 2 socket code appears to call sendall() in a loop.
msg244550 - (view)	Author: Marc Jofre (marcjofre)	Date: 2015-05-31 17:23
Hi all, I did more tests, identifying that the main issue I have resides in another script (multiprocessing), when using nuitka compiler. I will get back to you with more info when I solve the multiprocessing issue. The command prompt dump is: data: HTTP/1.0 200 OK self.stdout: <socket._fileobject object at 0x00000000042954F8> data: HTTP/1.0 200 OK self.stdout: <socket._fileobject object at 0x00000000042954F8> data: HTTP/1.0 200 OK self.stdout: <socket._fileobject object at 0x00000000042954F8> data: HTTP/1.0 200 OK self.stdout: <socket._fileobject object at 0x0000000004291A20> data: HTTP/1.0 200 OK self.stdout: <socket._fileobject object at 0x0000000004291B10> data: HTTP/1.0 200 OK self.stdout: <socket._fileobject object at 0x00000000042917C8> data: HTTP/1.0 200 OK self.stdout: <socket._fileobject object at 0x0000000004291930> data: HTTP/1.0 200 OK self.stdout: <socket._fileobject object at 0x00000000042954F8> data: HTTP/1.0 200 OK self.stdout: <socket._fileobject object at 0x00000000042954F8> data: HTTP/1.0 200 OK self.stdout: <socket._fileobject object at 0x00000000042919A8> data: HTTP/1.0 200 OK self.stdout: <socket._fileobject object at 0x0000000004291930> data: HTTP/1.0 200 OK self.stdout: <socket._fileobject object at 0x00000000042954F8> data: HTTP/1.0 500 INTERNAL SERVER ERROR self.stdout: <socket._fileobject object at 0x00000000042954F8>
msg260844 - (view)	Author: Paolo Veglia (Paolo Veglia)	Date: 2016-02-25 09:13
I had the same issue as Jonathan with Python 3.5.1, big chunks of data get truncated (to ~85KB). The problem disappears when splitting original data into smaller chunks: def _write(self,data): chunk_size = 1024 for chunk in (data[i:i+chunk_size] for i in range(0, len(data), chunk_size)): self.stdout.write(chunk)
msg262830 - (view)	Author: Jason Madden (jmadden) *	Date: 2016-04-03 20:12
gevent has another simple reproducer for this. I do believe it's not gevent's fault, the fault is in the standard library; SimpleHandler._write needs to loop until `sent += self.stdeout.write(data)` is `len(data)`. I have written up more on this at https://github.com/gevent/gevent/issues/778#issuecomment-205046001 For convenience I'll reproduce the bulk of that comment here: The user supplied a django application that produced a very large response that was getting truncated when using gevent under Python 3.4. (I believe gevent's non-blocking sockets are simply running into a different buffering behaviour, making it more likely to be hit under those conditions simply because they are faster). This looks like a bug in the standard library's `wsgiref` implementation. I tracked this down to a call to `socket.send(data)`. This method only sends whatever portion of the data it is possible to send at the time, and it returns the count of the data that was sent. The caller of `socket.send()` is responsible for looping to make sure the full `len` of the data is sent. This [is clearly documented](https://docs.python.org/3/library/socket.html#socket.socket.send). In this case, there is a call to `send` trying to send the full response, but only a portion of it is able to be immediately written. Here's a transcript of the first request (I modified gevent's `socket.send` method to print how much data is actually sent each time): ``` Django version 1.9.5, using settings 'gdc.settings' Starting development server at http://127.0.0.1:8000/ Quit the server with CONTROL-C. SENDING 17 SENT 17 OF 17 SENDING 37 SENT 37 OF 37 SENDING 38 SENT 38 OF 38 SENDING 71 SENT 71 OF 71 SENDING 1757905 SENT 555444 OF 1757905 [03/Apr/2016 19:48:31] "GET / HTTP/1.1" 200 1757905 ``` Note that there's no retry on the short send. Here's the stack trace for that short send; we can clearly see that there is no retry loop in place: ``` //3.4/lib/python3.4/wsgiref/handlers.py(138)run() 136 self.setup_environ() 137 self.result = application(self.environ, self.start_response) --> 138 self.finish_response() 139 except: 140 try: //3.4/lib/python3.4/wsgiref/handlers.py(180)finish_response() 178 if not self.result_is_file() or not self.sendfile(): 179 for data in self.result: --> 180 self.write(data) 181 self.finish_content() 182 finally: //3.4/lib/python3.4/wsgiref/handlers.py(279)write() 277 278 # XXX check Content-Length and truncate if too many bytes written? --> 279 self._write(data) 280 self._flush() 281 //3.4/lib/python3.4/wsgiref/handlers.py(453)_write() 451 452 def _write(self,data): --> 453 self.stdout.write(data) 454 455 def _flush(self): //3.4/lib/python3.4/socket.py(398)write() 396 self._checkWritable() 397 try: --> 398 return self._sock.send(b) 399 except error as e: 400 # XXX what about EINTR? > //gevent/_socket3.py(384)send() 382 from IPython.core.debugger import Tracer; Tracer()() ## DEBUG ## 383 --> 384 return count ``` `self.stdout` is an instance of `socket.SocketIO` (which is returned from `socket.makefile`). This is not documented on the web, but the [docstring also clearly documents](https://github.com/python/cpython/blob/3.4/Lib/socket.py#L389) that callers of `write` should loop to make sure all data gets sent.
msg262831 - (view)	Author: Jason Madden (jmadden) *	Date: 2016-04-03 20:24
Django uses a `wsgiref.simple_server` to serve requests, which in turn uses `socketserver.StreamRequestHandler` to implement its `WSGIRequestHandler`. That base class explicitly turns off buffering for writes (`wbufsize = 0` is the class default which gets passed to `socket.makefile`). So that explains how there's no `BufferedWriter` wrapped around the `SocketIO` instance.
msg262833 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-04-03 22:14
Okay now I see the conflict. The use of WSGIRequestHandler with wbufsize = 0 was the missing key. I see two possible solutions: 1. Change SimpleHandler._write() to allow self.stdout to be a RawIOBase writer, and loop over stdout.write() until all the data is written. This seems to be what most people are suggesting here. But it requires that stdout.write() returns the number of bytes written, so could be a compatibility problem. 2. Document that the SimpleHandler(stdout=...) parameter should be a BufferedIOBase writer, and fix WSGIRequestHandler to pass in a BufferedWriter (by overriding the wbufsize attribute with -1 or io.DEFAULT_BUFFER_SIZE). I am not super familiar with the wsgiref package, but the second option seems more preferable to me and more in line with my understanding of how it was intended to work.
msg262834 - (view)	Author: Jason Madden (jmadden) *	Date: 2016-04-03 22:52
Is there an expected `self.stdout` implementation that doesn't return the number of bytes it writes? `sys.stdout` does, as does `io.RawIOBase`. It doesn't seem clear to me that practically speaking there's a compatibility problem with requiring that for `self.stdout`.
msg262835 - (view)	Author: Jason Madden (jmadden) *	Date: 2016-04-03 23:01
`self.stdin` and `self.stderr` are documented to be `wsgi.input` and `wsgi.errors`, which are both described as "file-like" objects, meaning that the `write` method should return bytes written. It seems like the same could reasonably be said to be true for `self.stdout`, though it isn't strictly documented as such. The WSGI spec says that each chunk the application yields should be written immediately, with no buffering (https://www.python.org/dev/peps/pep-3333/#buffering-and-streaming), so I don't think having the default output stream be buffered would be in compliance. If there is a compatibility problem, writing the loop this way could bypass it (untested): def _write(self, data): written = self.stdout.write(data) while written is not None and written < len(data): written += self.stdout.write(data[written:])
msg262836 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-04-03 23:56
My worry was that it is easy to make a write() method that does not return anything, but is still useful in most cases. Since BufferedIOBase.write() has to guarantee to write everything, it may not seem important to return a value. But we could explicitly check for None as you suggested. In the BaseHandler class, each chunk yielded by the application is passed to BaseHandler.write(). But that method calls self._flush(), which should avoid any buffering problem. SimpleHandler._flush() implements this by calling self.stdout.flush().
msg262837 - (view)	Author: Jason Madden (jmadden) *	Date: 2016-04-04 00:10
I'm sorry, I'm still not following the argument that `write` is likely to return nothing. `RawIOBase` and `BufferedIOBase` both document that `write` must return the number of written bytes; if you don't return that, you break anything that assumes you do, as documented (because both patterns for checking if you need to keep looping are both TypeErrors: `written = 0; written += None` and `None < len(data)`); and if you ignore the return value, you fail when using any `IOBase` object that isn't buffered (exactly this case). But you are definitely right, explicitly checking for None can be done. It adds a trivial amount of overhead, but this isn't a production server. The only cost is code readability. Good point about the explicit calls to `flush()`, I thought flush was a no-op in some streams, but that's only the case for streams where it doesn't already matter.
msg263079 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-04-09 10:52
I suspect this bug does not affect Python 2. See my demo script wsgi-partial.py as evidence, which interrupts a 20 MB write with a signal. Jason: If you look at Python 2’s file.write() API <https://docs.python.org/2/library/stdtypes.html#file.write>, perhaps that will convince you that it is easy for someone to write a write() method that has no return value. Anyway, I think this bug is bigger than first thought. It affects most servers based on socketserver.StreamRequestHandler, not just the “wsgiref” one. In Python 3.5 I propose to patch the existing servers to avoid the possibility of truncated writes (patch forthcoming). In 3.6, I am thinking we should change StreamRequestHandler.wfile to prevent write() doing partial writes.
msg263085 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-04-09 12:22
wfile-partial.patch tries to avoid partial writes in every relevant wfile.write() call I could find. Usually I do this by building a BufferedWriter around wfile. I propose to apply my patch to 3.5 (if people think it is reasonable). For 3.6 I opened Issue 26721 to change the class used for wfile instead.
msg263087 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-04-09 12:35
I did not write a test case for the test suite. Doing so would probably involve too many non-trivial things, so I thought it wouldn’t be worth it: 1. A background thread or process to run a client in 2. Signal handling to trigger a partial write 3. Some way to synchronize the threads and the signal (sleep, sigwait, etc)
msg263163 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-04-11 03:46
In patch v2 I integrated a version of the test into test_wsgiref. I swapped the sleep() calls for a threading.Event object. Because my patch is such a large change, it would be good to get other people’s opinions or reviews.
msg263789 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-04-20 02:22
Widening the scope of the title to attract reviewers :)
msg265528 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-05-14 13:15
Since there are three different people reporting the problem with wsgiref, but no other reports for other server modules, I might commit the change to Lib/wsgiref/simple_server.py soon, and leave the other changes until they can get a wider review.
msg265615 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-05-15 13:47
SimpleHTTPRequestHandler uses shutil.copyfileobj(), and shutil.copyfileobj() expects that write() writes all data.
msg265691 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-05-16 11:29
Thanks for your comments Serhiy. The main change in v3 is that I added a socketserver._write_exactly() function instead of the _BaseHTTPRequestHandler__write_exactly() method.
msg266750 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-05-31 10:59
Here is a patch for just the wsgiref module, which I plan to commit in time for 3.5.2. I also updated the test case to avoid hanging due to a (theoretical) race with how signal handling works.
msg266926 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-06-02 19:21
It looks to me that there is two opposite ways to resolve this issue. 1. Document the requirement for write() and raise an exception if write() returned not the size of full data. But I think it would be more user-friendly first emit a warning and continue calling write() until full data is written. 2. Don't document any requirement, but just silently handle partial writing. wsgiref-only.patch contains parts from both ways. It documents new requirement, but doesn't deprecate partial writing. What way is chosen?
msg266986 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-06-02 23:29
I prefer your first solution because it seems to fit in better with how things were intended. I can add in handling of partial writes with a deprecation warning when I get a chance. I guess the documentation would be something like “Since 3.5.2, partial writes are also handled, but this is deprecated.”
msg267097 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-06-03 12:59
In wsgiref-only.v2.patch I added deprecated support for continuing when a partial write is detected. I don’t think it needs documenting though.
msg267393 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-06-05 06:59
New changeset a57703119f40 by Martin Panter in branch '3.5': Issue #24291: Avoid WSGIRequestHandler doing partial writes https://hg.python.org/cpython/rev/a57703119f40 New changeset d8f021944e0b by Martin Panter in branch 'default': Issue #24291: Merge wsgi partial write fix from 3.5 https://hg.python.org/cpython/rev/d8f021944e0b
msg267406 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-06-05 10:48
Here is an updated patch with the remaining fixes for other servers. I considered changing the behaviour of shutil.copyfileobj() to handle partial writes better. But I decided against it, because: (a) nobody has complained yet, and (b) it could potentially break stuff, e.g. if fsrc.read() returned a strange kind of object.
msg271066 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-07-23 08:40
The bug should no longer be a problem in 3.6, as long as my change for Issue 26721 survives. Here is an updated patch against 3.5.
msg305140 - (view)	Author: Martin Panter (martin.panter) *	Date: 2017-10-27 22:16
Closing because I understand it is too late to do anything for 3.5 now.

History
Date	User	Action	Args
2022-04-11 14:58:17	admin	set	github: 68479
2017-10-27 22:16:08	martin.panter	set	status: open -> closed resolution: out of date messages: + msg305140 stage: patch review -> resolved
2016-07-23 08:40:33	martin.panter	set	files: + wfile-partial.v5.patch messages: + msg271066 versions: - Python 3.6
2016-06-05 10:48:19	martin.panter	set	files: + wfile-partial.v4.patch messages: + msg267406
2016-06-05 06:59:12	python-dev	set	nosy: + python-dev messages: + msg267393
2016-06-03 12:59:45	martin.panter	set	files: + wsgiref-only.v2.patch messages: + msg267097
2016-06-02 23:29:02	martin.panter	set	messages: + msg266986
2016-06-02 19:21:18	serhiy.storchaka	set	messages: + msg266926
2016-05-31 10:59:35	martin.panter	set	files: + wsgiref-only.patch messages: + msg266750
2016-05-16 11:29:58	martin.panter	set	files: + wfile-partial.v3.patch messages: + msg265691
2016-05-15 13:47:44	serhiy.storchaka	set	nosy: + pitrou, benjamin.peterson, tarek, stutzbach messages: + msg265615 components: + IO
2016-05-14 13:22:32	serhiy.storchaka	set	nosy: + serhiy.storchaka
2016-05-14 13:15:20	martin.panter	set	messages: + msg265528
2016-04-20 02:22:56	martin.panter	set	messages: + msg263789 title: wsgiref.handlers.SimpleHandler truncates large output blobs -> Many servers (wsgiref, http.server, etc) can truncate large output blobs
2016-04-11 03:46:22	martin.panter	set	files: + wfile-partial.v2.patch messages: + msg263163
2016-04-09 12:35:57	martin.panter	set	messages: + msg263087
2016-04-09 12:22:16	martin.panter	set	files: + wfile-partial.patch keywords: + patch messages: + msg263085 stage: needs patch -> patch review
2016-04-09 10:52:18	martin.panter	set	files: + wsgi-partial.py messages: + msg263079 versions: - Python 2.7
2016-04-04 00:10:24	jmadden	set	messages: + msg262837
2016-04-03 23:56:52	martin.panter	set	messages: + msg262836
2016-04-03 23:01:08	jmadden	set	messages: + msg262835
2016-04-03 22:52:34	jmadden	set	messages: + msg262834
2016-04-03 22:14:08	martin.panter	set	stage: test needed -> needs patch messages: + msg262833 versions: + Python 3.6, - Python 3.4
2016-04-03 20:24:21	jmadden	set	messages: + msg262831
2016-04-03 20:12:33	jmadden	set	nosy: + jmadden messages: + msg262830 versions: + Python 3.4
2016-02-25 11:42:52	martin.panter	set	stage: test needed versions: + Python 2.7
2016-02-25 09:13:47	Paolo Veglia	set	nosy: + Paolo Veglia messages: + msg260844 versions: - Python 2.7, Python 3.4, Python 3.6
2015-05-31 17:23:55	marcjofre	set	messages: + msg244550
2015-05-31 00:48:21	martin.panter	set	messages: + msg244506
2015-05-31 00:40:10	martin.panter	set	messages: + msg244505
2015-05-30 20:19:28	marcjofre	set	nosy: + marcjofre messages: + msg244489
2015-05-26 22:01:09	martin.panter	set	nosy: + martin.panter messages: + msg244138
2015-05-26 20:52:02	ned.deily	set	nosy: + pje versions: + Python 3.6, - Python 3.2, Python 3.3
2015-05-26 20:18:01	Jonathan Kamens	create