Title: urlparse.urljoin() cuts off last base character with semicolon at url start
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 2.7
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: eric.araujo, michalp, orsenthil, weschow
Priority: high Keywords: patch

Created on 2010-08-31 05:20 by calvin, last changed 2010-12-17 04:57 by orsenthil. This issue is now closed.

File name Uploaded Description Edit
urlparse.patch michalp, 2010-08-31 16:48
urlparse-9721.patch weschow, 2010-11-20 19:18
urlparse-9721-3.2.patch weschow, 2010-11-20 20:21
urlparse-9721-2.7.patch weschow, 2010-11-20 20:21
Messages (9)
msg115252 - (view) Author: Bastian Kleineidam (calvin) Date: 2010-08-31 05:20
The urljoin() implementation cuts off the last base URL
character if the URL to join starts with a semicolon.
Expected output is no cut off characters.

$ python2.6
Python 2.6.6 (r266:84292, Aug 29 2010, 12:36:23) 
[GCC 4.4.5 20100824 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urlparse
>>> print urlparse.urljoin('http://localhost:8080/feedback', ';jsessionid=XXX')

... same in Python 3.1.2:

$ python3.1
Python 3.1.2 (release31-maint, Aug 29 2010, 18:45:17) 
[GCC 4.4.5 20100824 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.parse
>>> urllib.parse.urljoin('http://localhost:8080/feedback', ';jsessionid=XXX')

... in Python 2.5 the last path segment is cut off.
$ python2.5
Python 2.5.5 (r255:77872, Aug 23 2010, 02:55:15) 
[GCC 4.4.5 20100816 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
m>>> import urlparse
>>> print urlparse.urljoin('http://localhost:8080/feedback', ';jsessionid=XXX')
msg115253 - (view) Author: Bastian Kleineidam (calvin) Date: 2010-08-31 06:51
Update: the python2.5 behaviour is the expected and what I think the correct output.
msg115268 - (view) Author: Michał Powaga (michalp) Date: 2010-08-31 16:48
The problem was be here:

path = path[:-1] # This is not needed and cuts last character
return urlunparse((scheme, netloc, path,
                  params, query, fragment))

I sent a patch.

PS. Sorry if I`m doing something wrong but it is my first patch and activity in the Python project.
msg115464 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-09-03 16:44
The versions getting bug fixes are 2.7, 3.1 (stable versions) and 3.2 (active). 2.6 is in security mode now.

Can someone write a test (as a standalone script or a diff against Lib/test/ and tell if 3.1 and 3.2 have the bug too?

More importantly, can someone quote the latest URI RFC about that? The semicolon is related to the mysterious param component IIRC.
msg115736 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-09-07 03:01
Simply applying the patch would break the testsuite of urlparse ( test_RFC3986 case of urljoin for ';' starting joinurls). The expected behavior should be trim off all receding chars until a '/'  and then join the semi-colon starting url. 2.5 behavior is the correct one.
msg121729 - (view) Author: Wes Chow (weschow) Date: 2010-11-20 19:18
Here's a patch for 3.2 which fixes this problem I believe. There does exist a test case that should have produced an error, except that the last path segment in RFC3986_BASE is only one character long. Had it been more than one character long, RFC3986 checks would have failed.

Also in the supplied patch is a test to catch this specific bug.
msg121756 - (view) Author: Wes Chow (weschow) Date: 2010-11-20 20:21
New patch (urlparse-9721-3.2.patch) against 3.2 which fixes the erroneous test.
msg121758 - (view) Author: Wes Chow (weschow) Date: 2010-11-20 20:21
Patch against 2.7 (urlparse-9721-2.7.patch).
msg124183 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-12-17 04:57
Fixed in r87329, r87330 and r87331. Thanks for the patch Wes, that was proper.
Date User Action Args
2010-12-17 04:57:17orsenthilsetstatus: open -> closed
nosy: orsenthil, eric.araujo, michalp, weschow
messages: + msg124183

resolution: accepted -> fixed
stage: patch review -> resolved
2010-11-20 20:21:58weschowsetfiles: + urlparse-9721-2.7.patch

messages: + msg121758
2010-11-20 20:21:10weschowsetfiles: + urlparse-9721-3.2.patch

messages: + msg121756
2010-11-20 19:31:55r.david.murraysetversions: + Python 2.7, Python 3.2
2010-11-20 19:27:44r.david.murraysetstage: needs patch -> patch review
2010-11-20 19:18:30weschowsetfiles: + urlparse-9721.patch
nosy: + weschow
messages: + msg121729

2010-09-07 03:02:01orsenthilsetpriority: normal -> high
messages: + msg115736

assignee: orsenthil
resolution: accepted
stage: needs patch
2010-09-03 16:44:52eric.araujosetnosy: + eric.araujo

messages: + msg115464
versions: - Python 2.6
2010-08-31 17:48:55r.david.murraysetnosy: + orsenthil
2010-08-31 16:48:32michalpsetfiles: + urlparse.patch

messages: + msg115268
keywords: + patch
nosy: + michalp, - calvin
2010-08-31 06:51:04calvinsetmessages: + msg115253
versions: - Python 2.5
2010-08-31 05:20:56calvincreate