classification
Title: Broken code for handling file://host in urllib.request.FileHandler.file_open
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.4, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: martin.panter, mthuurne, orsenthil, python-dev, r.david.murray
Priority: normal Keywords: patch

Created on 2014-07-13 01:49 by martin.panter, last changed 2019-10-07 14:21 by mthuurne.

Files
File name Uploaded Description Edit
file-handler.patch martin.panter, 2015-05-27 05:31 review
Messages (6)
msg222901 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-07-13 01:49
This isn’t a particularly important problem for me but when reading the code I noticed some bit rot in this function, where a host name in a “file:” URL would be handled differently than intended.

* The url[:2] == '//' check is probably wrong because it is comparing the URL’s path component (selector), not the prefix for a host name. Compare urlopen("file://host//") and urlopen("file://host/") error messages.

* The req.host is self.get_names() should probably use “in”, not “is”. The code author presumably expected urlopen("file://127.0.0.1//dev/null") to work.

* Also it seems odd that urlopen("file://remote/missing") immediately reports “No such file”, while urlopen("file://remote/") blocks for a host name lookup and then reports “not on local host”.
msg223095 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2014-07-15 08:43
Thanks for the report. Point 2 is definitely a bug (and an overlook by me), I will fix it.

I think, the url[:2] == '//' check was present for ftp case which supported file:// protocol. I can't see a clear requirement to change here.

The scenarios you encounter when giving a two different paths are interesting because, in one the path stat('/invalid/path') and in the other it is the path '/' that is stat and only late is the hostname verified if it coming from localhost (http://hg.python.org/cpython/file/ddfcaeb1c56b/Lib/urllib/request.py#l1353). 
Since both presence of file locally and then ensuring the host is localhost needs to be done, the current order gives faster failures for most common use-cases.
msg223634 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-07-22 07:16
New changeset 4b98961748f1 by Senthil Kumaran in branch '3.4':
Fix localhost checking in FileHandler. Raised in #21970.
http://hg.python.org/cpython/rev/4b98961748f1

New changeset 2c660948bb41 by Senthil Kumaran in branch 'default':
Merge 3.4
http://hg.python.org/cpython/rev/2c660948bb41
msg223636 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2014-07-22 07:19
I have addressed the mistake where req.host is self.get_names() was done instead of req.host in self.get_names() in the first commit as it was an obvious problem.

I will come up with patch/solution addressing the other behavior mentioned in this report.
msg244144 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-27 05:31
I do not believe the change fixes anything on its own. It essentially just changed the error message to something even worse, and the added test case already passes without the change.

I am posting a patch which cleans up the code and related tests. It also fixes three minor theoretical bugs:

>>> urlopen("file://127.0.0.1//dev/null")  # Should succeed
urllib.error.URLError: <urlopen error unknown url type: file>
>>> urlopen("file://127.1//dev/null")  # 127.1 is localhost; should succeed
urllib.error.URLError: <urlopen error file:// scheme is supported only on localhost>
>>> urlopen("file://remote/missing")  # Should not try to access local file
urllib.error.URLError: <urlopen error [Errno 2] No such file or directory: '/missing'>

The localhost check in file_open() was added by revision 982c8ec73ae3 for Issue 10063. But notice how all the test cases involve strange URLs like file://ftp.example.com//foo.txt, with second extra double slash. I removed the check, because it is redundant with the more thorough and correct check that has existed since 1994 (revision 020d8c2d9d3c) in open_local_file().

For cases like file://remote/nonlocal-file, I think it is better to raise the correct error, even if that requires a DNS check, rather than raising a faster error with a misleading message. So I moved the check before the stat() call inside open_local_file().
msg354100 - (view) Author: Maarten ter Huurne (mthuurne) * Date: 2019-10-07 14:21
Another problem with the current code is that when passed a URL with a host name that is not empty or "localhost", but is one of the alternative names for localhost returned by get_names(), file_open() returns None implicitly instead of opening the file.

I think this issue would be fixed by the proposed patch. So it would be good if the patch could be reviewed and adopted.
History
Date User Action Args
2019-10-07 14:21:55mthuurnesetnosy: + mthuurne

messages: + msg354100
versions: + Python 3.7
2015-05-27 05:31:43martin.pantersetfiles: + file-handler.patch
versions: + Python 3.5, Python 3.6
messages: + msg244144

keywords: + patch
type: behavior
stage: patch review
2014-07-22 07:19:37orsenthilsetmessages: + msg223636
2014-07-22 07:16:28python-devsetnosy: + python-dev
messages: + msg223634
2014-07-18 16:14:49r.david.murraysetnosy: + r.david.murray
2014-07-15 08:43:46orsenthilsetnosy: + orsenthil
messages: + msg223095
2014-07-13 01:49:21martin.pantercreate