Issue17639
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013-04-05 09:29 by kristjan.jonsson, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Messages (14) | |||
---|---|---|---|
msg186069 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * ![]() |
Date: 2013-04-05 09:29 | |
When .py files are assembled into a directory structure using direct symbolic links to the files, something odd happens to sys.path[0]. Consider this file structure: /pystuff/ foo.py -> /scripts/foo.py bar.py -> /libs/bar.py foo.py contains the line: "import bar" "python /pystuff/foo.py" will now fail, because when foo.py is run, sys.path[0] will contain "/scripts", rather than the expected "/pystuff". It would appear that the algorithm for finding sys.path[0] is: sys.path[0] = os.dirname(os.realpath(filename)). IMO, it should be: sys.path[0] = os.realpath(os.dirname(filename)). I say that this behaviour is unexpected, because symlinking to individual files normally has the semantics of "pulling that file in" rather than "hopping to that file's real dir". As an example, the following works C, and other languages too, I should imagine: /code/ myfile.c -> /sources/myfile.c mylib.h -> /libs/mylib.h libmylib.so -> /libs/libmylib.so an "#include "mylib.h" in myfile.c would look for the file in /code and find it. a "cc myfile.c -lmylib" would find the libmylib.so in /code This problem was observed on linux, when running hadoop script jobs. The hadoop code (cloudera CDH4) creates a symlink copy of your file structure, where each file is individually symlinked to an place in a file cache, where each file may sit in a different physical dir, like this: tmp1/ a.py -> /secret/filecache/0001/a.py b.py -> /secret/filecache/0002/b.py c.py -> /secret/filecache/0003/c.py Suddenly, importing b and c from a.py won't work. if a, b, and c were .h files, then "#include "b.h"" from a.h would work. |
|||
msg186070 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * ![]() |
Date: 2013-04-05 09:29 | |
btw, this is the opposite issue to issue #1387483 |
|||
msg186077 - (view) | Author: Nick Coghlan (ncoghlan) * ![]() |
Date: 2013-04-05 12:14 | |
Adding Guido & Ned, as my recollection is that some of the weirdness in the sys.path[0] symlink resolution was to placate the test suite on Mac OS X (at least, that was a cause of failures in the initial runpy module implementation until Guido tracked down the discrepancy in symlink resolution between direct script execution and runpy). How does the test suite react if you change the order of application to resolve symlinks only after dropping the file name from the path? |
|||
msg186081 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2013-04-05 13:55 | |
> How does the test suite react if you change the order of application to resolve symlinks only after dropping the file name from the path? Note that this will break things, see e.g. http://bugs.python.org/issue1387483#msg186063 The only backward compatible way to handle this would be to add both directories to sys.path, hoping that there's no module with the same name in both directories. |
|||
msg186085 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2013-04-05 15:07 | |
Do not "fix" this. It is an intentional feature. There is a common pattern where one or more Python scripts are collected in some "bin" directory (presumably on the user's $PATH) as symlinks into the directory where they really live (not on $PATH, nor on sys.path). The other files needed by the script(s) are in the latter directory, and so it needs to be on sys.path[0]. If you change the symlink resolution, sys.path[0] will point to the "bin" directory and the scripts won't be able to find the rest of their modules. While there are probably better patterns to solve the problem that this intends to solve, the pattern is commonly used and I do not want it to be broken. If you are using symlinks for other purposes, well, too bad. |
|||
msg186086 - (view) | Author: Nick Coghlan (ncoghlan) * ![]() |
Date: 2013-04-05 15:30 | |
I'll add it to the list of docs updates for post-PEP 432 (similar to the import system in general finally getting reference docs in 3.3 following the migration to importlib, I hope to have improved import state initialisation docs for 3.4 if I successfully tame the interpreter initialisation code) |
|||
msg186087 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * ![]() |
Date: 2013-04-05 15:30 | |
1) _I_ am not using symlinks this way. The hadoop scheduling processor is. This means that we cannot use Python for it withouth hacking the scripts for the special case. Presumably applications are not generally breaking when run in an artificial file tree populated with symlinked files into arbitrary real locations, but Python is. Only Python seems to care about the _real_ location of the file, as opposed to the apparent location. 2) This particular use case is quite unobvious, and goes against the spirit of symbolic links. They are meant to be transparent for applications. Consider e.g. the analogue to e.g. C header files. Only Python seems to care about the _real_ location of the file, as opposed to the apparent location. Effectively, Python is actively using the knowledge of these links as a sort of dynamic sys.path modifying tool. I agree that it is not good to break existing usage, however misguided it may be. But in that case, isn't it possible to disable this symlink dereference via e.g. an option? |
|||
msg186089 - (view) | Author: Nick Coghlan (ncoghlan) * ![]() |
Date: 2013-04-05 15:49 | |
Not currently, because interpreter startup is a mess already. Overriding sys.path[0] initialisation is on the list for 3.4 already, I'm just advising strongly against piling any more complexity on top of the current rickety structure until we do something about the foundation. |
|||
msg186090 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2013-04-05 15:52 | |
I'm sure there's some change that can be made to the scripts that solves this locally, without requiring any changes to Python. |
|||
msg186091 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * ![]() |
Date: 2013-04-05 16:35 | |
Yes, of course. But I still maintain that the failure of python to work with a linktree of .py files, where the destination position of said links is arbitrary, is rather unusual, and IMHO violates the principle of least surprise. In this case, the existence of the virtual linktree is apparently an implementation detail of the hadoop implementation, not something that we as hadoop users were supposed to know or care about. Exploiting the OS file system implementation detail of a symbolic link as a language import feature is an example of an unusual coupling indeed, in my opinion. Even import-guru Nick didn't seem to be aware of this feature. It's great that we plan at least to document this unix-only feature at some point. Cheers! |
|||
msg186117 - (view) | Author: Nick Coghlan (ncoghlan) * ![]() |
Date: 2013-04-06 02:23 | |
The reason I haven't documented sys.path[0] initialisation is because I know I don't fully understand it. Path initialisation in general has a lot of historical quirks, particularly once symlinks are involved. |
|||
msg357282 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2019-11-22 13:56 | |
It is quite intentional that symlinks are followed for the purpose of computing sys. argv[0] and sys.path. -- --Guido (mobile) |
|||
msg357285 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * ![]() |
Date: 2019-11-22 14:14 | |
So you have already stated, and this issue is six years old now. While I no longer have a stake in this, I'd just like to reiterate that IMHO it breaks several good practices of architecture, particularly that of separation of roles. The abstraction called symbolic links is the domain of the filesystem. An application should accept the image that the filesystem offers, not try to second-guess the intent of an operator by arbitrarily, and unexpectedly, unrolling that abstraction. While you present a use case, I argue that it isn't, and shouldn't be, the domain of the application to intervene in an essentially shell specific, and operator specific process of collecting his favorite shortcuts in a folder. For that particular use case, a more sensible way would be for the user to simply create shell shortcuts, even aliases, for his favorite python scripts. This behaviour is basically taking over what should be the role of the shell. I'm unable to think of another program doing this sort of thin. I suppose that now, with the reworked startup process, it would be simpler to actually document this rather unexpected behaviour, and possibly provide a flag to override it. I know that I some spent time on this and came away rather stumped. |
|||
msg357302 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2019-11-22 17:24 | |
You have a point — I was just responding to Nick’s last message without noticing how old it was. I’ll remove myself from the nosy list. On Fri, Nov 22, 2019 at 15:14 Kristján Valur Jónsson <report@bugs.python.org> wrote: > > Kristján Valur Jónsson <sweskman@gmail.com> added the comment: > > So you have already stated, and this issue is six years old now. > > While I no longer have a stake in this, I'd just like to reiterate that > IMHO it breaks several good practices of architecture, particularly that of > separation of roles. > > The abstraction called symbolic links is the domain of the filesystem. An > application should accept the image that the filesystem offers, not try to > second-guess the intent of an operator by arbitrarily, and unexpectedly, > unrolling that abstraction. > > While you present a use case, I argue that it isn't, and shouldn't be, the > domain of the application to intervene in an essentially shell specific, > and operator specific process of collecting his favorite shortcuts in a > folder. For that particular use case, a more sensible way would be for the > user to simply create shell shortcuts, even aliases, for his favorite > python scripts. This behaviour is basically taking over what should be the > role of the shell. I'm unable to think of another program doing this sort > of thin. > > I suppose that now, with the reworked startup process, it would be simpler > to actually document this rather unexpected behaviour, and possibly provide > a flag to override it. I know that I some spent time on this and came away > rather stumped. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue17639> > _______________________________________ > -- --Guido (mobile) |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:43 | admin | set | github: 61839 |
2019-11-22 22:23:02 | gvanrossum | set | nosy:
- gvanrossum |
2019-11-22 17:24:52 | gvanrossum | set | messages: + msg357302 |
2019-11-22 14:14:19 | kristjan.jonsson | set | messages: + msg357285 |
2019-11-22 13:56:11 | gvanrossum | set | messages: + msg357282 |
2019-11-22 11:12:11 | Socob | set | nosy:
+ Socob |
2013-04-06 02:23:13 | ncoghlan | set | messages: + msg186117 |
2013-04-05 16:35:12 | kristjan.jonsson | set | messages: + msg186091 |
2013-04-05 15:52:11 | gvanrossum | set | messages: + msg186090 |
2013-04-05 15:49:33 | ncoghlan | set | messages: + msg186089 |
2013-04-05 15:30:49 | kristjan.jonsson | set | messages: + msg186087 |
2013-04-05 15:30:36 | ncoghlan | set | messages: + msg186086 |
2013-04-05 15:07:33 | gvanrossum | set | status: open -> closed resolution: wont fix messages: + msg186085 |
2013-04-05 13:55:00 | neologix | set | messages:
+ msg186081 title: symlinking .py files creates unexpected sys.path[0] -> symlinking .py files creates unexpected sys.path |
2013-04-05 12:22:10 | schmir | set | nosy:
+ schmir |
2013-04-05 12:14:13 | ncoghlan | set | nosy:
+ gvanrossum, ned.deily messages: + msg186077 |
2013-04-05 09:29:45 | kristjan.jonsson | set | messages: + msg186070 |
2013-04-05 09:29:06 | kristjan.jonsson | create |