classification
Title: POST large file to server (using http.server.CGIHTTPRequestHandler), always reset by server.
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: shajianrui
Priority: normal Keywords:

Created on 2019-06-12 15:24 by shajianrui, last changed 2019-06-12 15:24 by shajianrui.

Messages (1)
msg345370 - (view) Author: shajianrui (shajianrui) Date: 2019-06-12 15:24
Windows 10, python 3.7 

I met a problem when using the http.server module. I set up a base server with class HTTPServer and CGIHTTPRequestHandler(Not using thread or fork) and tried to POST a large file (>2MB), then I find the server always reset the connection. In some very rare situation the post operation could be finished(Very slow) but the CGI script I'm posting to always show that an incomplete file is received(Called "incomplete file issue").

==========First Try===========

At first I think (Actually a misunderstanding but lead to a passable walkaround) that "self.rfile.read(nbytes) " at LINE 1199 is not blocking, so it finish receiving just before the POST operation finished. Then I modify the line like this below:

1198        if self.command.lower() == "post" and nbytes > 0:
1199            #data = self.rfile.read(nbytes)     【The original line, I comment out it.】
                databuf = bytearray(nbytes)
                datacount = 0
                while datacount + 1 < nbytes:
                    buf = self.rfile.read(self.request.getsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF)
                    #print("Get " + str(len(buf)) + " bytes.")
                    for i in range(len(buf)):
                        databuf[datacount] = buf[i]
                        datacount += 1
                        if datacount == nbytes:
                            #print("Done.")
                            break
                data = bytes(databuf)       【Now get the data.】

In this modification I just try to repeatedly read 65536(Default number of socket) bytes from rfile until I get nbytes of bytes. Now it works well(Correct file received), and is much faster then the POSTing process when using the original http.server module(If "incomplete file issue" appear).

==========Second Try==========

However, now I know that there is no problem with "whether it is blocking" because "self.rfile.read()" should be blocked if the file is not POSTed completely. 

I check the tcp stream with wireshark and find that in the middle of the transfer, the recv window of server is always 256, so I think that the problem is at the variable "rbufsize", which is transfered to makefile() when the rfile of the XXXRequestHandler Object is created. At least it is the problem of the low speed. But I dont know whether it lead to the reset operation and the incomplete file issue.

I go back to the original version of the http.server module. Then I make a subclass of socketserver.StreamRequestHandler, override its setup() method(firstly I copy the codes of setup() from StreamRequestHandler, and modify Line770)(770 is the line number in socketserver module, but I create the new subclass in a new file.):

770     #self.rfile = self.connection.makefile('rb', self.rbufsize)
        self.rfile = self.connection.makefile('rb', 65536)

Then the POST process become much faster(Then my first modification)!

But the server print Error:

    File "c:\Users\Administrator\Desktop\cgi-server-test\modified_http_server_bad.py", line 1204, in run_cgi    【A copy of http.server module】
        while select.select([self.rfile._sock], [], [], 0)[0]:          【at line 1204】
    AttributeError: '_io.BufferedReader' object has no attribute '_sock'

Because I know it want to get the socket of the current RequestHandler, I just modify http.server module and change "self.rfile._sock" into "self.connection"(I dont know if it would cause problem, it is just a walkaround). 

OK, It now work well again. The CGI script can get the correct file(return the correct SHA1 of the file uploaded), and the POST process is REALLY MUCH FASTER!

========= Question =========

So here is the problem:
1- What cause the server resetting the connection? Seem it is because the default buffer size of the rfile is too small.
2- What cause the cgi script getting the incomplete file? I really have no idea about it. Seems this problem also disappear if I enlarge the buffer.

Other information:
1- The "incomplete file issue" usually appear at the first POST to the server, and almost all of the other POST connections are reset.
2- If the server start resetting connections, another "incomplete file issue" will never appear anymore (Actually it happen, but Chrome only show a RESET page, see 4- below.).
3- If the server start resetting connections, it take a long time to terminate the server with Ctrl+C.
4- When the connection is reset, the response printed by the cgi script is received correctly and it show that cgi script receive an incomplete file, the byte count is much fewer than correct number.(I use Chrome to do the POST, so it just show a reset message and the real response is ignored)

Please help.
History
Date User Action Args
2019-06-12 15:24:48shajianruicreate