classification
Title: urllib2 doesn't escape spaces in http requests
Type: behavior Stage: test needed
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: open Resolution: duplicate
Dependencies: Superseder: urlopen URL with unescaped space
View: 14826
Assigned To: Nosy List: Ramchandra Apte, davide.rizzo, ezio.melotti, karlcow, kiilerix, krisys, maker, martin.panter, orsenthil, sandro.tosi, senko
Priority: normal Keywords: patch

Created on 2011-11-06 20:13 by davide.rizzo, last changed 2017-06-03 05:54 by martin.panter.

Files
File name Uploaded Description Edit
issue13359.patch krisys, 2011-11-09 11:26 percent encoding of urls to fix the issue reported.
issue13359.patch maker, 2012-01-12 15:04 review
issue13359_py2.patch maker, 2012-01-12 15:30 review
urllib-request-space-encode.diff senko, 2013-07-06 10:08 review
Messages (10)
msg147180 - (view) Author: Davide Rizzo (davide.rizzo) * Date: 2011-11-06 20:13
urllib2.urlopen('http://foo/url and spaces') will send a HTTP request line like this to the server:

GET /url and spaces HTTP/1.1

which the server obviously does not understand. This contrasts with urllib's behaviour which replaces the spaces (' ') in the url with '%20'.

Related: #918368 #1153027
msg147349 - (view) Author: Krishna Bharadwaj (krisys) Date: 2011-11-09 11:26
I have used the quote method to percent encode the url for spaces and similar characters. This is my first patch. Please let me know if there is anything wrong. I will correct and re-submit it. I ran the test_urllib2.py which gave an OK for 34 tests.

Changes are made in two instances:
1. in the open method.
2. in the __init__ of Request class to ensure that the same issue is addressed at the time of creating Request objects.
msg149441 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2011-12-14 12:08
Seems good.
msg151126 - (view) Author: Michele OrrĂ¹ (maker) * Date: 2012-01-12 15:04
Patch attached for python3, with unit tests.
msg151127 - (view) Author: Mads Kiilerich (kiilerix) * Date: 2012-01-12 15:10
FWIW, I don't think it is a good idea to escape automatically. It will change the behaviour in a non-backward compatible way for existing applications that pass encoded urls to this function.

I think the existing behaviour is better. The documentation and the failure mode for passing URLs with spaces could however be improved.
msg151129 - (view) Author: Michele OrrĂ¹ (maker) * Date: 2012-01-12 15:30
Here the patch for python2.


kiilerix, RFC 1738 explicitly says that the space character shall not be used.
msg151131 - (view) Author: Mads Kiilerich (kiilerix) * Date: 2012-01-12 15:35
Yes, the url sent by urllib2 must not contain spaces. In my opinion the only way to handle that correctly is to not pass urls with spaces to urlopen. Escaping the urls is not a good solution - even if the API was to be designed from scratch. It would be better to raise an exception if it is passed an invalid url.

Note for example that '/' and the %-encoding of '/' are different, and it must thus be possible to pass an url containing both to urlopen. That is not possible if it automically escapes.
msg183576 - (view) Author: karl (karlcow) * Date: 2013-03-06 03:20
The issue with the current patch is that it is escaping more than only the spaces, with possibly indirect border effect.
Anne van Kesteren is in the process of creating a parsing/writing specification for URL. Not finished but putting it here for future reference.
http://url.spec.whatwg.org/
msg192400 - (view) Author: Senko Rasic (senko) * Date: 2013-07-06 10:08
I vote for the parse method converting the spaces (and only the spaces) explicitly, for the following reasons:

* the spaces must be encoded for the server to accept them
* no user-encoded url will ever have spaces in them
* space quoting is idempotent: quote(quote(' ')) == quote(' ')
* if the user did get an exception from Request in case of invalid url containing the spaces, the only thing he or she can do is to quote the url string

Here's a patch implementing this. The change allows for any whitespace character in the selector part of the url (and in particular, '\n'), not only ' '.
msg295066 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-06-03 05:54
I think this could be merged with Issue 14826. Maybe it is sensible to handle all control characters the same way.
History
Date User Action Args
2017-06-03 05:54:20martin.pantersetnosy: + martin.panter
messages: + msg295066
resolution: duplicate

superseder: urlopen URL with unescaped space
2013-07-06 10:08:12senkosetfiles: + urllib-request-space-encode.diff
nosy: + senko
messages: + msg192400

2013-03-06 03:20:50karlcowsetnosy: + karlcow
messages: + msg183576
2012-01-12 15:35:57kiilerixsetmessages: + msg151131
2012-01-12 15:30:04makersetfiles: + issue13359_py2.patch

messages: + msg151129
2012-01-12 15:10:58kiilerixsetnosy: + kiilerix
messages: + msg151127
2012-01-12 15:04:33makersetfiles: + issue13359.patch
nosy: + maker
messages: + msg151126

2011-12-14 12:08:23Ramchandra Aptesetnosy: + Ramchandra Apte
messages: + msg149441
2011-12-14 10:55:00sandro.tosisetnosy: + sandro.tosi
2011-11-09 11:26:12krisyssetfiles: + issue13359.patch

nosy: + krisys
messages: + msg147349

keywords: + patch
2011-11-06 20:14:48ezio.melottisetnosy: + ezio.melotti

stage: test needed
2011-11-06 20:13:46davide.rizzocreate