This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib2 doesn't always supply / where URI path component is empty
Type: behavior Stage: resolved
Components: Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: 2464 Superseder:
Assigned To: orsenthil Nosy List: dstanek, flox, jjlee, orsenthil, weschow
Priority: normal Keywords: easy, patch

Created on 2008-12-02 20:46 by jjlee, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
empty-path-4493.patch weschow, 2010-11-20 22:21
Messages (5)
msg76777 - (view) Author: John J Lee (jjlee) Date: 2008-12-02 20:46
As required by RFC 2616 section 3.2.2, for all HTTP requests sent by
urllib2, the path component of the URI should be normalized to "/"
before the Request-URI derived from it gets passed to httplib (or
something functionally equivalent to that).  This was fixed in one case
in #2464, but the fix is in the wrong place, since it's a general
problem not specific to redirects.  See the longer discussion here:

http://bugs.python.org/msg76736

(hmm, let's see if I can just say msg76736 and get a hyperlink)

Example:

import urllib2
urllib2.urlopen("http://python.org?spam")

Expect: sends "/?spam" in request line.

Got: sends "?spam" in request line.

Probably should be fixed by making Request.get_selector() return the
normalized URI reference (with the slash always present).  When fixing,
remember that the Request-URI of RFC 2616 (returned by .get_selector())
is sometimes a relative reference, and sometimes a URI (in RFC 3986's
terminology).
msg121797 - (view) Author: Wes Chow (weschow) Date: 2010-11-20 22:21
Attached is a patch against 3.2 that replaces empty paths with '/' in HTTPConnection. I do not totally understand the ; syntax in URIs, and so this implementation may break that, as it splits urls and unsplits them if needed. The Python docs seem to indicate there might be some obscure cases where this is problematic.

And yes, I do realize that this patch fixes the problem in yet another place. Hopefully HTTPConnection is the lowest common denominator.
msg122094 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-11-22 05:06
Fixed it in r86676 (py3k), r86677 ( release31-maint) and r86678(release27-maint).

Wes: I fixed it at the much higher level in the urlparse itself, so that the fixed url is sent to the httplib.


In issue2464, John had pointed out that according to STD 66, path component can legally be empty, so when it is empty this adding of '/' does not take place.

Also added tests and NEWS.
msg122121 - (view) Author: Wes Chow (weschow) Date: 2010-11-22 13:18
This same bug also exists in HTTPClient, and my patch addresses that. Addressing it in HTTPClient has a side effect of taking care of it for urllib2 as well (and all future libraries that use HTTPClient).

Even if the urllib2 patch is preferable, shouldn't we fix the problem in HTTPClient as well?
msg124185 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-12-17 05:32
Wes, I forgot to address your last comment. 

HTTPClient follows HTTP Spec for requests and responses. When it is used, the request is on the PATH and the code there checks if the path does not exist does a request on '/'. It is not appropriate to pass Invalid URLS to httpclient the Invalid url handling and corrections to that are handled at the much higher level. That's why I made those changes in urllib.
History
Date User Action Args
2022-04-11 14:56:42adminsetgithub: 48743
2010-12-17 05:32:46orsenthilsetnosy: jjlee, orsenthil, dstanek, flox, weschow
messages: + msg124185
2010-11-22 13:18:42weschowsetmessages: + msg122121
2010-11-22 05:06:55orsenthilsetstatus: open -> closed
resolution: fixed
messages: + msg122094

stage: test needed -> resolved
2010-11-20 22:21:38weschowsetfiles: + empty-path-4493.patch

nosy: + weschow
messages: + msg121797

keywords: + patch
2010-08-04 07:49:23floxsetnosy: + flox
2010-08-01 19:05:27dstaneksetnosy: + dstanek
2010-07-11 05:37:05orsenthilsetassignee: orsenthil
2010-07-10 16:55:02BreamoreBoysetversions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2009-04-22 18:47:49ajaksu2setpriority: normal
keywords: + easy
2009-02-12 19:14:38ajaksu2setnosy: + orsenthil
dependencies: + urllib2 can't handle http://www.wikispaces.com
type: behavior
stage: test needed
versions: + Python 2.6
2008-12-02 20:46:11jjleecreate