classification
Title: situation where urllib3 works, but urllib does not work
Type: behavior Stage: resolved
Components: Versions: Python 3.7
process
Status: closed Resolution:
Dependencies: Superseder: Cannot override 'connection: close' in urllib2 headers
View: 12849
Assigned To: Nosy List: deivid, martin.panter
Priority: normal Keywords:

Created on 2018-08-08 12:17 by deivid, last changed 2018-08-14 05:42 by martin.panter. This issue is now closed.

Messages (4)
msg323275 - (view) Author: David (deivid) Date: 2018-08-08 12:17
Hello!

Newbie to python here. I run into an issue with one desktop library, Cinnamon. Specifically this one: https://github.com/linuxmint/Cinnamon/issues/5926#issuecomment-411232144. This library uses the urllib in the standard library to download some json. But for some reason, it does not work for me. If however, I use [https://github.com/urllib3/urllib3](urllib3), it just works. It sounds like something the standard library could do better, so I'm reporting it here in case it's helpful.

A minimal example would be:


```python
from urllib.request import urlopen
 
data = urlopen("https://cinnamon-spices.linuxmint.com/json/applets.json").read()
 
print(data)
```

which just hangs for me. If I pass a specific number of bytes (less than ~65000), it works, but only downloads parts of the file.

Using the equivalent code in urllib3 works just fine:

```python
import urllib3

http = urllib3.PoolManager()
response = http.request('GET', 'https://cinnamon-spices.linuxmint.com/json/applets.json')
print(response.data)

```

This is on

```
Python 3.7.0 (default, Aug  7 2018, 23:24:26) 
[GCC 5.5.0 20171010] on linux
```

Any help troubleshooting this would be appreciated!
msg323467 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2018-08-13 08:10
I can’t get it to hang. Does your computer or Internet provider have a proxy or firewall that may be interfering?

Perhaps it is worth comparing the HTTP header fields being sent and received. You can enable debug messages to see the request sent, and print the response fields directly. Most important things to look for are the Content-Length and Transfer-Encoding (if any) fields in the response.

>>> import urllib.request
>>> url = "https://cinnamon-spices.linuxmint.com/json/applets.json"
>>> handler = urllib.request.HTTPSHandler(debuglevel=1)
>>> opener = urllib.request.build_opener(handler)
>>> resp = opener.open(url)
send: b'GET /json/applets.json HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: cinnamon-spices.linuxmint.com\r\nUser-Agent: Python-urllib/3.6\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server header: Date header: Content-Type header: Content-Length header: Connection header: Last-Modified header: ETag header: X-Sucuri-Cache header: X-XSS-Protection header: X-Frame-Options header: X-Content-Type-Options header: X-Sucuri-ID header: Accept-Ranges $
>>> print(response.info())
Server: Sucuri/Cloudproxy
Date: Mon, 13 Aug 2018 07:18:11 GMT
Content-Type: application/json
Content-Length: 70576
Connection: close
Last-Modified: Mon, 13 Aug 2018 07:25:14 GMT
ETag: "113b0-5734bfe97145e"
X-Sucuri-Cache: HIT
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
X-Sucuri-ID: 11014
Accept-Ranges: bytes


>>> data = resp.read()
>>> len(data)
70576

Another experiment would be to try “http.client” directly, which I understand is used by both the built-in “urllib.request” module, and “urllib3”:

from http.client import HTTPSConnection
conn = HTTPSConnection("cinnamon-spices.linuxmint.com")
headers = {  # Same header fields sent by “urllib.request”
    "Accept-Encoding": "identity",
    "Host": "cinnamon-spices.linuxmint.com",
    "User-Agent": "Python-urllib/3.6",
    "Connection": "close",
}
conn.request("GET", "/json/applets.json", headers=headers)
resp = conn.getresponse()
print(resp.msg)
data = resp.read()

Try removing the “Connection: close” field from the request. Occasionally this triggers bad server behaviour (see Issue 12849); maybe your server or proxy is affected.
msg323478 - (view) Author: David (deivid) Date: 2018-08-13 12:26
Hi Martin.

It's definitely something with my internet connection. Yesterday I temporarily changed the way I connect to the internet to use the mobile connection from my cell phone instead of my WiFi connection, and things started working.

I also debugged the headers being received and I did notice the "Connection: Close" header was the only relevant difference in the request when comparing it to the request sent by my browser when accessing that page directly. My next task was to investigate how to do what you just suggested... With my currently knowledge of python it would've taken me ages to figure out, so thanks so much!

Let me try your suggestions and report back! Thanks so much for your help! :)
msg323480 - (view) Author: David (deivid) Date: 2018-08-13 12:39
martin.parter, it worked! Thanks so much, I was going nuts!!!! I also read the issue you pointed to, very interesting. Even if all servers should just work here, it does not seem to be the case in real life (I guess it's something easy to misconfigure) so I agree not setting "Connection: false" by default would make the standard lib more user friendly.

I guess I'll now talk to the maintainers of the upstream library and suggest the following:

* Reading this issue and the one you pointed to.
* Reviewing their server configuration.
* Migrating to http.client, specially if they don't make to fix the server configuration.

This can now be closed, thanks so much again <3
History
Date User Action Args
2018-08-14 05:42:56martin.pantersetsuperseder: Cannot override 'connection: close' in urllib2 headers
2018-08-13 12:39:51deividsetstatus: open -> closed

messages: + msg323480
stage: test needed -> resolved
2018-08-13 12:26:39deividsetmessages: + msg323478
2018-08-13 08:10:58martin.pantersetversions: + Python 3.7
nosy: + martin.panter

messages: + msg323467

type: behavior
stage: test needed
2018-08-08 12:17:07deividcreate