Author ezio.melotti
Recipients ezio.melotti
Date 2015-10-19.09:31:22
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1445247083.91.0.927080887762.issue25439@psf.upfronthosting.co.za>
In-reply-to
Content
Currently urllib.request.Request seems to accept invalid types silently, only to fail later on with unhelpful errors when the request is passed to urlopen.

This might cause users to go through something similar:

>>> r = Request(b'https://www.python.org')
>>> # so far, so good
>>> urlopen(r)
Traceback (most recent call last):
  ...
urllib.error.URLError: <urlopen error unknown url type: b'https>

This unhelpful error might lead users to think https is not supported, whereas the problem is that the url should have been str, not bytes.

The same problem applies to post data:

>>> r = Request('https://www.python.org', {'post': 'data'})
>>> # so far, so good
>>> urlopen(r)
Traceback (most recent call last):
  ...
TypeError: memoryview: dict object does not have the buffer interface
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  ...
ValueError: Content-Length should be specified for iterable data of type <class 'dict'> {'post': 'data'}

This error seems to indicate that Content-Length should be specified somewhere, but AFAICT from the docs, the data should be bytes or None -- so let's try to urlencode them:

>>> r = Request('https://www.python.org', urlencode({'post': 'data'}))
>>> # so far, so good
>>> urlopen(r)
Traceback (most recent call last):
  ...
TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.

OK, maybe I should use bytes in the dict:

>>> r = Request('https://www.python.org', urlencode({b'post': b'data'}))
>>> # so far, so good
>>> urlopen(r)
Traceback (most recent call last):
  ...
TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.

That didn't work, I also needed to encode the output of urlencode().


Most of these problems could be prevented if Request() raised for non-str URLs, and non-bytes (and non-None) POST data.  Unless there some valid reason to accept invalid types, I think they should be rejected early.
History
Date User Action Args
2015-10-19 09:31:23ezio.melottisetrecipients: + ezio.melotti
2015-10-19 09:31:23ezio.melottisetmessageid: <1445247083.91.0.927080887762.issue25439@psf.upfronthosting.co.za>
2015-10-19 09:31:23ezio.melottilinkissue25439 messages
2015-10-19 09:31:22ezio.melotticreate