Message 253173 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	ezio.melotti
Date	2015-10-19.09:31:22
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1445247083.91.0.927080887762.issue25439@psf.upfronthosting.co.za>
In-reply-to

Content
Currently urllib.request.Request seems to accept invalid types silently, only to fail later on with unhelpful errors when the request is passed to urlopen. This might cause users to go through something similar: >>> r = Request(b'https://www.python.org') >>> # so far, so good >>> urlopen(r) Traceback (most recent call last): ... urllib.error.URLError: <urlopen error unknown url type: b'https> This unhelpful error might lead users to think https is not supported, whereas the problem is that the url should have been str, not bytes. The same problem applies to post data: >>> r = Request('https://www.python.org', {'post': 'data'}) >>> # so far, so good >>> urlopen(r) Traceback (most recent call last): ... TypeError: memoryview: dict object does not have the buffer interface During handling of the above exception, another exception occurred: Traceback (most recent call last): ... ValueError: Content-Length should be specified for iterable data of type <class 'dict'> {'post': 'data'} This error seems to indicate that Content-Length should be specified somewhere, but AFAICT from the docs, the data should be bytes or None -- so let's try to urlencode them: >>> r = Request('https://www.python.org', urlencode({'post': 'data'})) >>> # so far, so good >>> urlopen(r) Traceback (most recent call last): ... TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str. OK, maybe I should use bytes in the dict: >>> r = Request('https://www.python.org', urlencode({b'post': b'data'})) >>> # so far, so good >>> urlopen(r) Traceback (most recent call last): ... TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str. That didn't work, I also needed to encode the output of urlencode(). Most of these problems could be prevented if Request() raised for non-str URLs, and non-bytes (and non-None) POST data. Unless there some valid reason to accept invalid types, I think they should be rejected early.

Currently urllib.request.Request seems to accept invalid types silently, only to fail later on with unhelpful errors when the request is passed to urlopen.

This might cause users to go through something similar:

>>> r = Request(b'https://www.python.org')
>>> # so far, so good
>>> urlopen(r)
Traceback (most recent call last):
  ...
urllib.error.URLError: <urlopen error unknown url type: b'https>

This unhelpful error might lead users to think https is not supported, whereas the problem is that the url should have been str, not bytes.

The same problem applies to post data:

>>> r = Request('https://www.python.org', {'post': 'data'})
>>> # so far, so good
>>> urlopen(r)
Traceback (most recent call last):
  ...
TypeError: memoryview: dict object does not have the buffer interface
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  ...
ValueError: Content-Length should be specified for iterable data of type <class 'dict'> {'post': 'data'}

This error seems to indicate that Content-Length should be specified somewhere, but AFAICT from the docs, the data should be bytes or None -- so let's try to urlencode them:

>>> r = Request('https://www.python.org', urlencode({'post': 'data'}))
>>> # so far, so good
>>> urlopen(r)
Traceback (most recent call last):
  ...
TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.

OK, maybe I should use bytes in the dict:

>>> r = Request('https://www.python.org', urlencode({b'post': b'data'}))
>>> # so far, so good
>>> urlopen(r)
Traceback (most recent call last):
  ...
TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.

That didn't work, I also needed to encode the output of urlencode().


Most of these problems could be prevented if Request() raised for non-str URLs, and non-bytes (and non-None) POST data.  Unless there some valid reason to accept invalid types, I think they should be rejected early.

History
Date	User	Action	Args
2015-10-19 09:31:23	ezio.melotti	set	recipients: + ezio.melotti
2015-10-19 09:31:23	ezio.melotti	set	messageid: <1445247083.91.0.927080887762.issue25439@psf.upfronthosting.co.za>
2015-10-19 09:31:23	ezio.melotti	link	issue25439 messages
2015-10-19 09:31:22	ezio.melotti	create