This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: packaging.pypi.simple.Crawler assumes external download links are ok to follow
Type: behavior Stage: resolved
Components: Distutils2 Versions: Python 3.3
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: tarek Nosy List: alexis, eric.araujo, michael.mulich, tarek
Priority: normal Keywords:

Created on 2011-06-19 21:08 by michael.mulich, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (2)
msg138663 - (view) Author: Michael Mulich (michael.mulich) * Date: 2011-06-19 21:08
The packaging.pypi.simple.Crawler blindly follows external download URLs. The crawler should honor a list of allowed hosts (see also the hosts parameter) before attempting to download from an external source.

Éric Araujo has also pointed out that established tools like easy_install and pip provide ways of allowing/restricting by host.
msg138666 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-06-19 21:12
Extract from IRC:
<pumazi> hmm... I'm thinking Crawler's follow_externals flag isn't working as expected
[...]
<pumazi> I'm not sure, my assumption of [its] function could be off
[...]
<merwok> “hosts is a list of hosts allowed to be processed if follow_externals is true (default behavior is to follow all hosts), follow_externals enables or disables following external links (default is false, meaning disabled).”
<pumazi> Well, I was assuming it would disable external downloads
<merwok> I think “external links” are external links to be scraped, not download links
<merwok> But I see your misunderstanding
<pumazi> I see, but wouldn't we want the same restrictions on download links?
[...]
<merwok> IIUC, follow_externals can be disabled because it’s guesswork
<merwok> The info obtained from XML-RPC or the simple interface is not guesswork
<merwok> So I think you could want to disable guessing from external links, but I don’t see why you should care about the origin of the download
<pumazi> trust issues I suppose
<merwok> But the same person can upload a malicious file to PyPI as well as on their site
<merwok> Without reading the code, I think this is the rationale.  OTOH, if easy_install and pip can restrict downloads and your user expectations show that it can be needed to restrict downloads, let’s file a bug
History
Date User Action Args
2022-04-11 14:57:18adminsetgithub: 56577
2014-03-13 03:56:05eric.araujosetstatus: open -> closed
resolution: out of date
stage: resolved
2011-06-19 21:12:19eric.araujosetmessages: + msg138666
2011-06-19 21:08:19michael.mulichcreate