classification
Title: urljoin duplicate slashes
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: demian.brecht, martin.panter, orsenthil, pitrou, python-dev, scoder
Priority: normal Keywords: patch

Created on 2014-08-26 17:58 by demian.brecht, last changed 2015-04-15 23:11 by berker.peksag. This issue is now closed.

Files
File name Uploaded Description Edit
issue22278.patch demian.brecht, 2014-08-26 18:00
issue22278_2.patch demian.brecht, 2014-09-18 14:50 review
Messages (11)
msg225923 - (view) Author: Demian Brecht (demian.brecht) * Date: 2014-08-26 17:58
Reported by Stefan Behnel in issue22118:

I'm now getting duplicated slashes in URLs, e.g.:

https://new//foo.html
http://my.little.server/url//logo.gif

In both cases, the base URL that gets joined with the postfix had a trailing slash, e.g.

"http://my.little.server/url/" + "logo.gif" -> "http://my.little.server/url//logo.gif"
msg226164 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-08-31 10:17
This patch seems to fix issue #22311 as well (*). However it would be good to add more tests for base URLs with trailing slashes, it seems.

(*) without patch:

>>> base = """https://pypi.python.org/simple/werkzeug/"""
>>> rel = """../../packages/2.3/W/Werkzeug/Werkzeug-0.3.1-py2.3.egg#md5=5f669acf04af135ad8577d99a4387504"""
>>> urllib.parse.urljoin(base, rel)
'https://pypi.python.org/simple/packages/2.3/W/Werkzeug/Werkzeug-0.3.1-py2.3.egg#md5=5f669acf04af135ad8577d99a4387504'

with patch:

>>> urllib.parse.urljoin(base, rel)
'https://pypi.python.org/packages/2.3/W/Werkzeug/Werkzeug-0.3.1-py2.3.egg#md5=5f669acf04af135ad8577d99a4387504'
msg226168 - (view) Author: Stefan Behnel (scoder) * Date: 2014-08-31 10:25
Were the tests in

http://bugs.python.org/file32591/urischemes.py

merged yet, that Nick Coghlan mentioned in http://bugs.python.org/issue22118#msg225662 ?
msg226240 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-09-01 17:52
Those tests don't seem to bring much. Part of them are straight from the RFC (and therefore already in the current test suite, I assume), part of them are for non-HTTP protocols such as "fred" (!). A couple of them seem to be genuine, although only one fails and it's a corner case.
msg226249 - (view) Author: Demian Brecht (demian.brecht) * Date: 2014-09-01 22:41
I'll try to get some time this week to extend the various test cases, thanks for pointing that out Antoine.

I also found that, other than the few RFC-specific blocks in the link that Nick added in the other ticket, not only were they questionable (non-HTTP as Antoine pointed out), but they were also just plain wrong in some cases given the new semantics.
msg227048 - (view) Author: Demian Brecht (demian.brecht) * Date: 2014-09-18 14:50
Antoine: On (finally) getting back to this and re-reading your test case, the current behaviour is incorrect and is corrected by the patch. I've added a few more test cases to ensure trailing slashes are handled correctly.
msg227086 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2014-09-19 09:14
Except for the minor comments made by Antoine in the review, the patch looks good to go.
msg227254 - (view) Author: Roundup Robot (python-dev) Date: 2014-09-22 07:49
New changeset 901e4e52b20a by Senthil Kumaran in branch 'default':
Issue #22278: Fix urljoin problem with relative urls, a regression observed
https://hg.python.org/cpython/rev/901e4e52b20a
msg227255 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2014-09-22 07:50
I addressed Antoine's comments with the patch and committed it. Thank you!
msg227273 - (view) Author: Demian Brecht (demian.brecht) * Date: 2014-09-22 14:13
Heh, I'd finally gotten a few minutes to address the comments... And it's already taken care of ;) Thanks Senthil.
msg238494 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-19 07:11
I opened Issue 23703 for a regression caused by this commit.
History
Date User Action Args
2015-04-15 23:11:15berker.peksagsetstage: patch review -> resolved
2015-03-19 07:11:13martin.pantersetnosy: + martin.panter
messages: + msg238494
2014-09-22 14:13:21demian.brechtsetmessages: + msg227273
2014-09-22 07:50:02orsenthilsetstatus: open -> closed
assignee: orsenthil
resolution: fixed
messages: + msg227255
2014-09-22 07:49:30python-devsetnosy: + python-dev
messages: + msg227254
2014-09-19 09:14:56orsenthilsetmessages: + msg227086
2014-09-18 14:50:52demian.brechtsetfiles: + issue22278_2.patch

messages: + msg227048
2014-09-01 22:41:40demian.brechtsetmessages: + msg226249
2014-09-01 17:52:44pitrousetmessages: + msg226240
2014-08-31 10:25:44scodersetmessages: + msg226168
2014-08-31 10:18:31pitrousetnosy: + scoder
2014-08-31 10:18:17pitroulinkissue22311 superseder
2014-08-31 10:17:24pitrousetmessages: + msg226164
2014-08-26 21:14:45ned.deilysetnosy: + orsenthil, pitrou

stage: patch review
2014-08-26 18:00:21demian.brechtsetfiles: + issue22278.patch
keywords: + patch
2014-08-26 17:58:07demian.brechtcreate