classification
Title: Possible duplicate entries in sys.path if .pth files are used with zip's
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: brett.cannon Nosy List: SilentGhost, brett.cannon, mgrang, python-dev, tds333
Priority: normal Keywords: patch

Created on 2016-03-18 10:23 by tds333, last changed 2016-04-08 22:10 by brett.cannon. This issue is now closed.

Files
File name Uploaded Description Edit
issue26587.diff SilentGhost, 2016-03-20 11:53 review
site.patch tds333, 2016-03-20 14:14 Patch for site with test review
site2.patch tds333, 2016-03-26 17:33 review
Messages (13)
msg261958 - (view) Author: Wolfgang Langner (tds333) * Date: 2016-03-18 10:23
In site.py there is the internal function _init_pathinfo() This function builds a set of path entries from sys.path. This is used to avoid duplicate entries in sys.path.
But this function has a check if it is a directory with os.path.isdir(...). All this is fine as long as someone has a .zip file in sys.path or a zipfile subpath. Then the path entry is not part of the set. With this duplicate detection with none directories does not work.

The fix is as simple as removing the os.path.isdir(...) line and fixing the indent. Also the docstring should be modified.

Detected by using this function in a project reusing addsitedir(...) functionality to add another path with .pth processing.
msg261974 - (view) Author: SilentGhost (SilentGhost) * Date: 2016-03-18 16:26
Could you provide a code example of your using addsitedir that results in duplicates?
msg262059 - (view) Author: Mandeep Singh Grang (mgrang) Date: 2016-03-19 23:37
Here is a testcase to reproduce the issue:

> cat test.py
import site, sys
site.addsitedir('/foo/bar')
print (sys.path)

This prints just a single instance of '/foo/bar':

['/local/mnt/workspace/python/tst', '/foo/bar', '/usr/local/lib/python36.zip', '/local/mnt/workspace/python/src/Lib', '/local/mnt/workspace/python/src/Lib/plat-linux', '/local/mnt/workspace/python/build/build/lib.linux-x86_64-3.6-pydebug']

Now if we explicitly set PYTHONPATH to include '/foo/bar'
> export PYTHONPATH=/foo/bar

and then run test.py here is the output:

['/local/mnt/workspace/python/tst', '/foo/bar', '/usr/local/lib/python36.zip', '/local/mnt/workspace/python/src/Lib', '/local/mnt/workspace/python/src/Lib/plat-linux', '/local/mnt/workspace/python/build/build/lib.linux-x86_64-3.6-pydebug', '/foo/bar']

We see that there are duplicate entries for '/foo/bar'.

As Wolfgang rightly said the issue comes from the check for
"if os.path.isdir(dir)" inside _init_pathinfo() in site.py.
On removing this check I no longer see the duplicate entry for '/foo/bar'.

But since this is the first bug I am looking at I am not sure of the implications of removing this check. Can someone please confirm that what I see is indeed a failing test case, or is this the intended behavior?

Thanks,
Mandeep
msg262070 - (view) Author: SilentGhost (SilentGhost) * Date: 2016-03-20 11:53
Thanks for this, Mandeep. I don't think it is entirely the same issue that Wolfgang is describing. He's particularly concerned about the clash of .zip files from the sys.path with ones coming from .pth files for example. And while the proposed fix would resolve the issue, I don't feel it would be entirely correct. For the record, I'm not even sure sys.path is guaranteed to be contain no duplicates, at least I wasn't able to find a statement to that effect in the docs. In any case, I'm adding Brett here who was seemingly the last one to touch that bit of code.

I'm also attaching a path with what I think is a more appropriate and smaller fix.
msg262078 - (view) Author: Wolfgang Langner (tds333) * Date: 2016-03-20 14:14
Extended unit test for the issue and patch for site.py.
msg262193 - (view) Author: Wolfgang Langner (tds333) * Date: 2016-03-22 16:01
I think a fix for 3.6 only is ok, because it changes behaviour.
But this is only an internal function with a "_".

Should I add a test with a temporary created pth file?
msg262195 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-03-22 16:23
Unfortunately you can't simply remove that directory check because you don't want to blindly normalize case. If someone put some token value on sys.path for their use that was case-sensitive even on a case-insensitive OS then the proposed change would break those semantics (e.g. if someone put a URL in sys.path for a REST-based importer).

The possibilities I see are:

1. Don't change anything; duplicate entries don't really hurt anything
2. Remove duplicate entries, but only normalize case for directories
3. Remove duplicate entries, but normalize case for anything that points to something on the filesystem (i.e. both directories and files)
msg262196 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-03-22 16:23
And the code under discussion can be found at https://hg.python.org/cpython/file/default/Lib/site.py#l133
msg262200 - (view) Author: SilentGhost (SilentGhost) * Date: 2016-03-22 16:45
I still think my fix is more appropriate as it ensures that known_paths and sys.path stay connected somehow.
msg262493 - (view) Author: Wolfgang Langner (tds333) * Date: 2016-03-26 17:33
Ok, I implemented point 3.
Check if it is a dir or file and makepath only in this case.
All other entries are added unmodified to the set.

Added a test case also for an URL path.

I think duplicate detection is now improved and it should break nothing.
msg263045 - (view) Author: Roundup Robot (python-dev) Date: 2016-04-08 22:04
New changeset bd1af1a97c2e by Brett Cannon in branch 'default':
Issue #26587: Allow .pth files to specify file paths as well as
https://hg.python.org/cpython/rev/bd1af1a97c2e
msg263046 - (view) Author: Roundup Robot (python-dev) Date: 2016-04-08 22:08
New changeset 09dc97edf454 by Brett Cannon in branch '3.5':
Issue #26587: Remove an incorrect statement from the docs
https://hg.python.org/cpython/rev/09dc97edf454

New changeset 94d5c57ee835 by Brett Cannon in branch 'default':
Merge w/ 3.5 for issue #26587
https://hg.python.org/cpython/rev/94d5c57ee835
msg263047 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-04-08 22:10
I simplified Wolfgang's patch by simply using os.path.exists(). That eliminated the one place where os.path.isdir() was in use that was too specific to directories where files were reasonable to expect.

I also quickly tweaked the docs for the site  module in 3.5 to not promise that files would work.

If you think there is still an issue with keeping things tied together, SilentGhost, feel free to open another issue to track it.
History
Date User Action Args
2016-04-08 22:10:20brett.cannonsetstatus: open -> closed
resolution: fixed
messages: + msg263047

stage: patch review -> resolved
2016-04-08 22:08:09python-devsetmessages: + msg263046
2016-04-08 22:04:36python-devsetnosy: + python-dev
messages: + msg263045
2016-03-26 18:26:11brett.cannonsetassignee: brett.cannon
2016-03-26 17:33:45tds333setfiles: + site2.patch

messages: + msg262493
2016-03-22 16:45:37SilentGhostsetmessages: + msg262200
2016-03-22 16:23:59brett.cannonsetmessages: + msg262196
2016-03-22 16:23:34brett.cannonsetmessages: + msg262195
2016-03-22 16:01:53tds333setmessages: + msg262193
2016-03-20 14:14:26tds333setfiles: + site.patch

messages: + msg262078
2016-03-20 11:53:07SilentGhostsetfiles: + issue26587.diff

nosy: + brett.cannon
messages: + msg262070

keywords: + patch
stage: test needed -> patch review
2016-03-19 23:37:28mgrangsetnosy: + mgrang
messages: + msg262059
2016-03-18 16:26:20SilentGhostsetversions: - Python 3.4
nosy: + SilentGhost

messages: + msg261974

stage: test needed
2016-03-18 10:23:00tds333create