classification
Title: Disallow relative files paths in urllib*.open()
Type: behavior Stage: commit review
Components: Library (Lib) Versions: Python 3.3, Python 3.2, Python 2.7
process
Status: open Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: albert, amaury.forgeotdarc, orsenthil, python-dev
Priority: normal Keywords:

Created on 2009-08-03 13:33 by albert, last changed 2012-01-21 09:19 by orsenthil.

Files
File name Uploaded Description Edit
unsplit.py albert, 2009-08-03 13:33 patched function + some doc/test
Messages (7)
msg91222 - (view) Author: albert Mietus (albert) Date: 2009-08-03 13:33
The functions urlparse.url{,un}split() and urllib{,2}.open() do not work 
together for relative, local files, due a bug in urlunsplit.

Given a file f='./rel/path/to/file.html' it can be open directly by 
urllib.open(f), but not in urllib2! as the later needs a scheme.
We can create a sound url with spilt/unspilt and a default scheme:
f2=urlparse.urlunsplit(urlparse.urlsplit(f,'file')); which works most 
cases, HOWEVER a bogus netloc is added for relative filepaths.

If have isolated this  "buggy" function, added some local testcode and 
made patch/workaround in my file 'unsplit.py' Which is included. Hope 
this will contribute to a real patch.


--Groetjes, Albert

ALbert Mietus
                                                Don't send spam mail!
Mijn missie: http://SoftwareBeterMaken.nl      product, proces & imago.
Mijn leven in het kort:        
http://albert.mietus.nl/Doc/CV_ALbert.html
msg91402 - (view) Author: albert Mietus (albert) Date: 2009-08-07 12:41
There was a bug in the workaround:

    if not ( scheme == 'file' and not netloc and url[0] != '/'):
---------------------------------------------=================---

The {{{and url[0] != '/'}}} was missing (above is corrected)

The effect: split/unspilt file:///path resulted in file:/path
msg100175 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-02-26 21:17
The bug here seems to me that urllib.urlopen() should not allow a relative file path like the one specified. f='./rel/path/to/file.html
urllib2's behavior seems proper that it is raising an Exception.

According to the RFCs the local files are to be acceessed by:
file://localhost/path/to/file
file:///path/to/file

Both are absolute paths to the file where in the second one localhost is omitted.

Let me see if urllib's urlopen be made a little stricter.
msg151715 - (view) Author: Roundup Robot (python-dev) Date: 2012-01-21 03:43
New changeset f6008e936fbc by Senthil Kumaran in branch '2.7':
Fix Issue6631 - Disallow relative files paths in urllib*.open()
http://hg.python.org/cpython/rev/f6008e936fbc
msg151716 - (view) Author: Roundup Robot (python-dev) Date: 2012-01-21 03:55
New changeset 4366c0df2c73 by Senthil Kumaran in branch '2.7':
NEWS entry for Issue6631
http://hg.python.org/cpython/rev/4366c0df2c73

New changeset 514994d7a9f2 by Senthil Kumaran in branch '3.2':
Fix  Issue6631 - Disallow relative file paths in urllib urlopen
http://hg.python.org/cpython/rev/514994d7a9f2
msg151726 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012-01-21 08:35
Sorry, why was this change backported?
Does this fix a specific issue in 2.7 or 3.2?
On the contrary, it seems to me that code which (incorrectly) used urllib.urlopen() to allow both urls and local files will suddenly break.
msg151727 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-01-21 09:19
Actually, I saw this as a bug with urllib.urlopen and urllib2 had
exhibited proper behaviour previously. Now, both behaviour will be
consistent now.

But, you are right that an *incorrect* usage of urllib.urlopen would
break in 2.7.2. 

If we need to be lenient on that incorrect usage, then this change can
be there in 3.x series, because of urllib.request.urlopen would be
interface which users will be using and it can be reverted from 2.7.

Personally, I am +/- 0 on reverting this in 2.7. Initially, I saw this
as a bug, but later when I added tests for ValueError and checkedin,
I realized that it can break some incorrect usages, as you say.
History
Date User Action Args
2012-01-21 09:19:51orsenthilsetstatus: pending -> open

messages: + msg151727
2012-01-21 08:35:31amaury.forgeotdarcsetstatus: closed -> pending

nosy: + amaury.forgeotdarc
messages: + msg151726

stage: committed/rejected -> commit review
2012-01-21 03:57:05orsenthilsetstatus: open -> closed
type: performance -> behavior
stage: committed/rejected
resolution: fixed
versions: + Python 2.7, Python 3.2, Python 3.3
2012-01-21 03:55:58python-devsetmessages: + msg151716
2012-01-21 03:43:26python-devsetnosy: + python-dev
messages: + msg151715
2012-01-21 03:42:49orsenthilsettitle: urlparse.urlunsplit() can't handle relative files (for urllib*.open() -> Disallow relative files paths in urllib*.open()
2010-02-26 21:17:27orsenthilsetassignee: orsenthil

messages: + msg100175
nosy: + orsenthil
2009-08-07 12:41:03albertsetmessages: + msg91402
2009-08-03 13:33:02albertcreate