This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: symlinking .py files creates unexpected sys.path
Type: behavior Stage:
Components: Interpreter Core, Library (Lib) Versions: Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Socob, kristjan.jonsson, ncoghlan, ned.deily, neologix, schmir
Priority: normal Keywords:

Created on 2013-04-05 09:29 by kristjan.jonsson, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (14)
msg186069 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-04-05 09:29
When .py files are assembled into a directory structure using direct symbolic links to the files, something odd happens to sys.path[0].

Consider this file structure:
/pystuff/
  foo.py -> /scripts/foo.py
  bar.py -> /libs/bar.py

foo.py contains the line: "import bar"
"python /pystuff/foo.py" will now fail, because when foo.py is run, sys.path[0] will contain "/scripts", rather than the expected "/pystuff".

It would appear that the algorithm for finding sys.path[0] is:
sys.path[0] = os.dirname(os.realpath(filename)).
IMO, it should be:
sys.path[0] = os.realpath(os.dirname(filename)).

I say that this behaviour is unexpected, because symlinking to individual files normally has the semantics of "pulling that file in" rather than "hopping to that file's real dir".

As an example, the following works C, and other languages too, I should imagine:
/code/
  myfile.c -> /sources/myfile.c
  mylib.h  -> /libs/mylib.h
  libmylib.so -> /libs/libmylib.so

an "#include "mylib.h" in myfile.c would look for the file in /code and find it.
a "cc myfile.c -lmylib" would find the libmylib.so in /code

This problem was observed on linux, when running hadoop script jobs.  The hadoop code (cloudera CDH4) creates a symlink copy of your file structure, where each file is individually symlinked to an place in a file cache, where each file may sit in a different physical dir, like this:

tmp1/
 a.py -> /secret/filecache/0001/a.py
 b.py -> /secret/filecache/0002/b.py
 c.py -> /secret/filecache/0003/c.py

Suddenly, importing b and c from a.py won't work.
if a, b, and c were .h files, then "#include "b.h"" from a.h would work.
msg186070 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-04-05 09:29
btw, this is the opposite issue to issue #1387483
msg186077 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-04-05 12:14
Adding Guido & Ned, as my recollection is that some of the weirdness in the sys.path[0] symlink resolution was to placate the test suite on Mac OS X (at least, that was a cause of failures in the initial runpy module implementation until Guido tracked down the discrepancy in symlink resolution between direct script execution and runpy).

How does the test suite react if you change the order of application to resolve symlinks only after dropping the file name from the path?
msg186081 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-04-05 13:55
> How does the test suite react if you change the order of application to resolve symlinks only after dropping the file name from the path?

Note that this will break things, see e.g.
http://bugs.python.org/issue1387483#msg186063

The only backward compatible way to handle this would be to add both
directories to sys.path, hoping that there's no module with the same
name in both directories.
msg186085 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-04-05 15:07
Do not "fix" this. It is an intentional feature.

There is a common pattern where one or more Python scripts are collected in some "bin" directory (presumably on the user's $PATH) as symlinks into the directory where they really live (not on $PATH, nor on sys.path). The other files needed by the script(s) are in the latter directory, and so it needs to be on sys.path[0]. If you change the symlink resolution, sys.path[0] will point to the "bin" directory and the scripts won't be able to find the rest of their modules.

While there are probably better patterns to solve the problem that this intends to solve, the pattern is commonly used and I do not want it to be broken.

If you are using symlinks for other purposes, well, too bad.
msg186086 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-04-05 15:30
I'll add it to the list of docs updates for post-PEP 432 (similar to the import system in general finally getting reference docs in 3.3 following the migration to importlib, I hope to have improved import state initialisation docs for 3.4 if I successfully tame the interpreter initialisation code)
msg186087 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-04-05 15:30
1) _I_ am not using symlinks this way.  The hadoop scheduling processor is.  This means that we cannot use Python for it withouth hacking the scripts for the special case.  Presumably applications are not generally breaking when run in an artificial file tree populated with symlinked files into arbitrary real locations, but Python is.  Only Python seems to care about the _real_ location of the file, as opposed to the apparent location.
2) This particular use case is quite unobvious, and goes against the spirit of symbolic links. They are meant to be transparent for applications.  Consider e.g. the analogue to e.g. C header files. Only Python seems to care about the _real_ location of the file, as opposed to the apparent location. Effectively, Python is actively using the knowledge of these links as a sort of dynamic sys.path modifying tool.

I agree that it is not good to break existing usage, however misguided it may be.  But in that case, isn't it possible to disable this symlink dereference via e.g. an option?
msg186089 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-04-05 15:49
Not currently, because interpreter startup is a mess already. Overriding sys.path[0] initialisation is on the list for 3.4 already, I'm just advising strongly against piling any more complexity on top of the current rickety structure until we do something about the foundation.
msg186090 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-04-05 15:52
I'm sure there's some change that can be made to the scripts that
solves this locally, without requiring any changes to Python.
msg186091 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-04-05 16:35
Yes, of course.  But I still maintain that the failure of python to work with a linktree of .py files, where the destination position of said links is arbitrary, is rather unusual, and IMHO violates the principle of least surprise.  In this case, the existence of the virtual linktree is apparently an implementation detail of the hadoop implementation, not something that we as hadoop users were supposed to know or care about.

Exploiting the OS file system implementation detail of a symbolic link as a language import feature is an example of an unusual coupling indeed, in my opinion.

Even import-guru Nick didn't seem to be aware of this feature.  It's great that we plan at least to document this unix-only feature at some point.

Cheers!
msg186117 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-04-06 02:23
The reason I haven't documented sys.path[0] initialisation is because I
know I don't fully understand it. Path initialisation in general has a lot
of historical quirks, particularly once symlinks are involved.
msg357282 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-11-22 13:56
It is quite intentional that symlinks are followed for the purpose of
computing sys. argv[0] and sys.path.
-- 
--Guido (mobile)
msg357285 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2019-11-22 14:14
So you have already stated, and this issue is six years old now.

While I no longer have a stake in this, I'd just like to reiterate that IMHO it breaks several good practices of architecture, particularly that of separation of roles.

The abstraction called symbolic links is the domain of the filesystem.  An application should accept the image that the filesystem offers, not try to second-guess the intent of an operator by arbitrarily, and unexpectedly, unrolling that abstraction.

While you present a use case, I argue that it isn't, and shouldn't be, the domain of the application to intervene in an essentially shell specific, and operator specific process of collecting his favorite shortcuts in a folder.  For that particular use case, a more sensible way would be for the user to simply create shell shortcuts, even aliases, for his favorite python scripts.  This behaviour is basically taking over what should be the role of the shell.  I'm unable to think of another program doing this sort of thin.

I suppose that now, with the reworked startup process, it would be simpler to actually document this rather unexpected behaviour, and possibly provide a flag to override it.  I know that I some spent time on this and came away rather stumped.
msg357302 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-11-22 17:24
You have a point — I was just responding to Nick’s last message without
noticing how old it was. I’ll remove myself from the nosy list.

On Fri, Nov 22, 2019 at 15:14 Kristján Valur Jónsson <report@bugs.python.org>
wrote:

>
> Kristján Valur Jónsson <sweskman@gmail.com> added the comment:
>
> So you have already stated, and this issue is six years old now.
>
> While I no longer have a stake in this, I'd just like to reiterate that
> IMHO it breaks several good practices of architecture, particularly that of
> separation of roles.
>
> The abstraction called symbolic links is the domain of the filesystem.  An
> application should accept the image that the filesystem offers, not try to
> second-guess the intent of an operator by arbitrarily, and unexpectedly,
> unrolling that abstraction.
>
> While you present a use case, I argue that it isn't, and shouldn't be, the
> domain of the application to intervene in an essentially shell specific,
> and operator specific process of collecting his favorite shortcuts in a
> folder.  For that particular use case, a more sensible way would be for the
> user to simply create shell shortcuts, even aliases, for his favorite
> python scripts.  This behaviour is basically taking over what should be the
> role of the shell.  I'm unable to think of another program doing this sort
> of thin.
>
> I suppose that now, with the reworked startup process, it would be simpler
> to actually document this rather unexpected behaviour, and possibly provide
> a flag to override it.  I know that I some spent time on this and came away
> rather stumped.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue17639>
> _______________________________________
>
-- 
--Guido (mobile)
History
Date User Action Args
2022-04-11 14:57:43adminsetgithub: 61839
2019-11-22 22:23:02gvanrossumsetnosy: - gvanrossum
2019-11-22 17:24:52gvanrossumsetmessages: + msg357302
2019-11-22 14:14:19kristjan.jonssonsetmessages: + msg357285
2019-11-22 13:56:11gvanrossumsetmessages: + msg357282
2019-11-22 11:12:11Socobsetnosy: + Socob
2013-04-06 02:23:13ncoghlansetmessages: + msg186117
2013-04-05 16:35:12kristjan.jonssonsetmessages: + msg186091
2013-04-05 15:52:11gvanrossumsetmessages: + msg186090
2013-04-05 15:49:33ncoghlansetmessages: + msg186089
2013-04-05 15:30:49kristjan.jonssonsetmessages: + msg186087
2013-04-05 15:30:36ncoghlansetmessages: + msg186086
2013-04-05 15:07:33gvanrossumsetstatus: open -> closed
resolution: wont fix
messages: + msg186085
2013-04-05 13:55:00neologixsetmessages: + msg186081
title: symlinking .py files creates unexpected sys.path[0] -> symlinking .py files creates unexpected sys.path
2013-04-05 12:22:10schmirsetnosy: + schmir
2013-04-05 12:14:13ncoghlansetnosy: + gvanrossum, ned.deily
messages: + msg186077
2013-04-05 09:29:45kristjan.jonssonsetmessages: + msg186070
2013-04-05 09:29:06kristjan.jonssoncreate