Issue15991
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012-09-20 20:03 by theamk, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
http_server_bug.py | theamk, 2012-09-20 20:02 | A test script to reproduce the bug. |
Messages (12) | |||
---|---|---|---|
msg170840 - (view) | Author: Mikhail Afanasyev (theamk) | Date: 2012-09-20 20:02 | |
When using BaseHTTPServer with ThreadingMixIn, sometimes the wrong data is served. The attached script requests normal URL and URLs which are not found in multiple threads using wget (which only saves pages if it gets 200 OK status). Every once in a while responses are incorrect. - sometimes normal URL gets a '404 not found' content with '200 OK' status. - sometimes, a '404 not found' page gets '200 OK' status code The problems become much, much less frequent if ThreadingMixIn is removed. Python 2.7.3 on ubuntu 12.04 x86_64. |
|||
msg170914 - (view) | Author: Catalin Iacob (catalin.iacob) * | Date: 2012-09-21 20:42 | |
Can't reproduce this on openSUSE Factory (development version, what will become 12.3). I tried with openSUSE's Python 2 which is 2.7.3 and with self compiled up to date 2.7 branch from hg. For each of these I bumped the number of files to 3000 instead of 300 and ran the script around 10 times, always got 3000 good 0 bad. |
|||
msg171079 - (view) | Author: Thomas Lee (thomaslee) | Date: 2012-09-23 20:07 | |
I can reproduce this on Debian Wheezy. Both with the system Python (2.7.3rc2) and the 2.7 branch built from source. |
|||
msg171083 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-09-24 00:06 | |
I can reproduce under Debian testing, but not Debian stable nor Mageia 1. I think this is really a bug in recent Debian/Ubuntu wget. By instrumenting SocketServer, I see that less requests are received than expected (e.g. ~350 instead of 400). It seems wget is trying to send further requests on the same client socket, even though the server responded with a HTTP/1.0 status line and without a "connection: keep-alive" header, indicating it doesn't want keep-alive. Even explicitly sending a "connection: close" header doesn't seem to change wget's behaviour. If you pass "--no-keep-alive" to wget, though, it solves the issue. Running Wireshark would probably help confirm this. |
|||
msg171084 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-09-24 00:13 | |
Note that forcing a content length on error responses also seems to make wget happy: diff --git a/Lib/BaseHTTPServer.py b/Lib/BaseHTTPServer.py --- a/Lib/BaseHTTPServer.py +++ b/Lib/BaseHTTPServer.py @@ -362,14 +362,19 @@ message = short explain = long self.log_error("code %d, message %s", code, message) - # using _quote_html to prevent Cross Site Scripting attacks (see bug #1100201) - content = (self.error_message_format % - {'code': code, 'message': _quote_html(message), 'explain': explain}) + if self.command != 'HEAD' and code >= 200 and code not in (204, 304): + # using _quote_html to prevent Cross Site Scripting attacks (see bug #1100201) + content = (self.error_message_format % + {'code': code, 'message': _quote_html(message), 'explain': explain}) + else: + content = None self.send_response(code, message) self.send_header("Content-Type", self.error_content_type) + if content is not None: + self.send_header("Content-Length", str(len(content))) self.send_header('Connection', 'close') self.end_headers() - if self.command != 'HEAD' and code >= 200 and code not in (204, 304): + if content is not None: self.wfile.write(content) error_message_format = DEFAULT_ERROR_MESSAGE |
|||
msg172402 - (view) | Author: Mikhail Afanasyev (theamk) | Date: 2012-10-08 19:00 | |
The bug is not only wget-specific. It was discovered while making APT proxy, so at least Debian APT fetcher has the same problem. |
|||
msg172404 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-10-08 19:04 | |
> The bug is not only wget-specific. It was discovered while making APT > proxy, so at least Debian APT fetcher has the same problem. Well, perhaps it uses wget too? If you really think this is a bug in Python, you have to diagnose it further (for example by using Wireshark and analyzing the bytes exchanged between server and client). I've done all the debugging I can. |
|||
msg172414 - (view) | Author: Charles-François Natali (neologix) * | Date: 2012-10-08 20:58 | |
> The bug is not only wget-specific. It was discovered while making APT proxy, so at least Debian APT fetcher has the same problem. Debian uses wget. |
|||
msg172415 - (view) | Author: Charles-François Natali (neologix) * | Date: 2012-10-08 20:58 | |
>> The bug is not only wget-specific. It was discovered while making APT proxy, so at least Debian APT fetcher has the same problem. > > Debian uses wget. I meant apt-get uses wget. |
|||
msg172542 - (view) | Author: Charles-François Natali (neologix) * | Date: 2012-10-09 22:54 | |
wget trace: Here a 404 is treated correctly: """ socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5 connect(5, {sa_family=AF_INET, sin_port=htons(4242), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 select(6, NULL, [5], NULL, {900, 0}) = 1 (out [5], left {900, 0}) write(5, "GET /MISSING-8276 HTTP/1.1\r\nUser"..., 126) = 126 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {899, 999000}) recvfrom(5, "HTTP/1.0 404 Not Found\r\n", 511, MSG_PEEK, NULL, NULL) = 24 read(5, "HTTP/1.0 404 Not Found\r\n", 24) = 24 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) recvfrom(5, "Server: BaseHTTP/0.3 Python/2.4."..., 487, MSG_PEEK, NULL, NULL) = 35 read(5, "Server: BaseHTTP/0.3 Python/2.4."..., 35) = 35 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) recvfrom(5, "Date: Tue, 09 Oct 2012 22:09:59 "..., 452, MSG_PEEK, NULL, NULL) = 37 read(5, "Date: Tue, 09 Oct 2012 22:09:59 "..., 37) = 37 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) recvfrom(5, "Content-Type: text/html\r\n", 415, MSG_PEEK, NULL, NULL) = 25 read(5, "Content-Type: text/html\r\n", 25) = 25 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) recvfrom(5, "Connection: close\r\n", 390, MSG_PEEK, NULL, NULL) = 19 read(5, "Connection: close\r\n", 19) = 19 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) recvfrom(5, "\r\n", 371, MSG_PEEK, NULL, NULL) = 2 read(5, "\r\n", 2) = 2 select(6, [5], NULL, NULL, {0, 1}) = 1 (in [5], left {0, 1}) close(5) = 0 """ And here a 404 is treated incorrectly: """ socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5 connect(5, {sa_family=AF_INET, sin_port=htons(4242), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 select(6, NULL, [5], NULL, {900, 0}) = 1 (out [5], left {900, 0}) write(5, "GET /MISSING-8153 HTTP/1.1\r\nUser"..., 126) = 126 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {899, 998000}) recvfrom(5, "HTTP/1.0 404 Not Found\r\n", 511, MSG_PEEK, NULL, NULL) = 24 read(5, "HTTP/1.0 404 Not Found\r\n", 24) = 24 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) recvfrom(5, "Server: BaseHTTP/0.3 Python/2.4."..., 487, MSG_PEEK, NULL, NULL) = 35 read(5, "Server: BaseHTTP/0.3 Python/2.4."..., 35) = 35 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) recvfrom(5, "Date: Tue, 09 Oct 2012 22:10:01 "..., 452, MSG_PEEK, NULL, NULL) = 37 read(5, "Date: Tue, 09 Oct 2012 22:10:01 "..., 37) = 37 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) recvfrom(5, "Content-Type: text/html\r\n", 415, MSG_PEEK, NULL, NULL) = 25 read(5, "Content-Type: text/html\r\n", 25) = 25 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) recvfrom(5, "Connection: close\r\n", 390, MSG_PEEK, NULL, NULL) = 19 read(5, "Connection: close\r\n", 19) = 19 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) recvfrom(5, "\r\n", 371, MSG_PEEK, NULL, NULL) = 2 read(5, "\r\n", 2) = 2 HERE=> select(6, [5], NULL, NULL, {0, 1}) = 0 (Timeout) select(6, NULL, [5], NULL, {900, 0}) = 1 (out [5], left {900, 0}) write(5, "GET /FOUND-8154 HTTP/1.1\r\nUser-A"..., 124) = 124 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) recvfrom(5, "<head>\n<title>Error response</ti"..., 511, MSG_PEEK, NULL, NULL) = 190 stat("FOUND-8154", 0x7fffcc530d60) = -1 ENOENT (No such file or directory) open("FOUND-8154", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 7 clock_gettime(CLOCK_MONOTONIC, {130079, 24215187}) = 0 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) read(5, "<head>\n<title>Error response</ti"..., 8192) = 190 clock_gettime(CLOCK_MONOTONIC, {130079, 24258187}) = 0 fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b0004bbc000 write(7, "<head>\n<title>Error response</ti"..., 190) = 190 write(4, "\n 0K "..., 777) = 777 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) read(5, 0x1becc8f0, 8192) = -1 ECONNRESET (Connection reset by peer) close(5) = 0 """ You can se that after 404 and "Connection: close" is returned, wget calls select() with a 1us timeout, to check if the socket has been closed by the other end. That's /src/connect.c:test_socket_open() in wget source code. This is used by persistent_available_p() in src/http.c, with this comment: """ Finally, check whether the connection is still open. This is important because most servers implement liberal (short) timeout on persistent connections. Wget can of course always reconnect if the connection doesn't work out, but it's nicer to know in advance. """ So it deliberately reuses the same connection (so-called persistent connection) if the http server doesn't close it right away. This explains probably why you see this mainly when many threads are in use. I'm not an http expert, so I'm not sure whether it's legal... |
|||
msg172553 - (view) | Author: Charles-François Natali (neologix) * | Date: 2012-10-10 07:12 | |
According to http://tools.ietf.org/html/rfc2616#section-8.1, persistent connections are default in HTTP 1.1. However, here's what's said about the persistence negotiation: """ An HTTP/1.1 server MAY assume that a HTTP/1.1 client intends to maintain a persistent connection unless a Connection header including the connection-token "close" was sent in the request. If the server chooses to close the connection immediately after sending the response, it SHOULD send a Connection header including the connection-token close. An HTTP/1.1 client MAY expect a connection to remain open, but would decide to keep it open based on whether the response from a server contains a Connection header with the connection-token close. In case the client does not want to maintain a connection for more than that request, it SHOULD send a Connection header including the connection-token close. If either the client or the server sends the close token in the Connection header, that request becomes the last one for the connection. """ And if you have a look at the strace output (or a tcpdump capture), you see that httpserver indeeds sends a "Connection: close" header, which wget ignores: """ select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) recvfrom(5, "Connection: close\r\n", 390, MSG_PEEK, NULL, NULL) = 19 read(5, "Connection: close\r\n", 19) = 19 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) recvfrom(5, "\r\n", 371, MSG_PEEK, NULL, NULL) = 2 read(5, "\r\n", 2) = 2 select(6, [5], NULL, NULL, {0, 1}) = 0 (Timeout) select(6, NULL, [5], NULL, {900, 0}) = 1 (out [5], left {900, 0}) write(5, "GET /FOUND-8154 HTTP/1.1\r\nUser-A"..., 124) = 124 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) recvfrom(5, "<head>\n<title>Error response</ti"..., 511, MSG_PEEK, NULL, NULL) = 190 stat("FOUND-8154", 0x7fffcc530d60) = -1 ENOENT (No such file or directory) open("FOUND-8154", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 7 clock_gettime(CLOCK_MONOTONIC, {130079, 24215187}) = 0 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) read(5, "<head>\n<title>Error response</ti"..., 8192) = 190 clock_gettime(CLOCK_MONOTONIC, {130079, 24258187}) = 0 fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b0004bbc000 write(7, "<head>\n<title>Error response</ti"..., 190) = 190 write(4, "\n 0K "..., 777) = 777 select(6, [5], NULL, NULL, {900, 0}) = 1 (in [5], left {900, 0}) read(5, 0x1becc8f0, 8192) = -1 ECONNRESET (Connection reset by peer) close(5) = 0 """ So it really looks like a wget bug. I'd suggest closing as invalid, and reporting this to wget developers. |
|||
msg172554 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-10-10 08:47 | |
Furthermore, BaseHTTPServer responds with a HTTP/1.0 status line (not HTTP/1.1), meaning persistent connections are not the default. Therefore, I'm closing as invalid. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:36 | admin | set | github: 60195 |
2012-10-10 08:47:02 | pitrou | set | status: open -> closed resolution: not a bug messages: + msg172554 |
2012-10-10 07:12:07 | neologix | set | messages: + msg172553 |
2012-10-09 22:54:40 | neologix | set | messages: + msg172542 |
2012-10-08 20:58:34 | neologix | set | messages: + msg172415 |
2012-10-08 20:58:07 | neologix | set | messages: + msg172414 |
2012-10-08 19:04:31 | pitrou | set | messages: + msg172404 |
2012-10-08 19:00:04 | theamk | set | messages: + msg172402 |
2012-09-24 00:13:42 | pitrou | set | messages: + msg171084 |
2012-09-24 00:06:25 | pitrou | set | nosy:
+ pitrou, neologix messages: + msg171083 |
2012-09-23 20:07:30 | thomaslee | set | nosy:
+ thomaslee messages: + msg171079 |
2012-09-21 20:42:24 | catalin.iacob | set | nosy:
+ catalin.iacob messages: + msg170914 |
2012-09-20 20:03:00 | theamk | create |