Title: symlinking .py files creates unexpected sys.path
Type: behavior Stage:
Components: Interpreter Core, Library (Lib) Versions: Python 3.3, Python 3.4, Python 2.7
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: gvanrossum, kristjan.jonsson, ncoghlan, ned.deily, neologix, schmir
Priority: normal Keywords:

Created on 2013-04-05 09:29 by kristjan.jonsson, last changed 2013-04-06 02:23 by ncoghlan. This issue is now closed.

Messages (11)
msg186069 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-04-05 09:29
When .py files are assembled into a directory structure using direct symbolic links to the files, something odd happens to sys.path[0].

Consider this file structure:
/pystuff/ -> /scripts/ -> /libs/ contains the line: "import bar"
"python /pystuff/" will now fail, because when is run, sys.path[0] will contain "/scripts", rather than the expected "/pystuff".

It would appear that the algorithm for finding sys.path[0] is:
sys.path[0] = os.dirname(os.realpath(filename)).
IMO, it should be:
sys.path[0] = os.realpath(os.dirname(filename)).

I say that this behaviour is unexpected, because symlinking to individual files normally has the semantics of "pulling that file in" rather than "hopping to that file's real dir".

As an example, the following works C, and other languages too, I should imagine:
  myfile.c -> /sources/myfile.c
  mylib.h  -> /libs/mylib.h -> /libs/

an "#include "mylib.h" in myfile.c would look for the file in /code and find it.
a "cc myfile.c -lmylib" would find the in /code

This problem was observed on linux, when running hadoop script jobs.  The hadoop code (cloudera CDH4) creates a symlink copy of your file structure, where each file is individually symlinked to an place in a file cache, where each file may sit in a different physical dir, like this:

tmp1/ -> /secret/filecache/0001/ -> /secret/filecache/0002/ -> /secret/filecache/0003/

Suddenly, importing b and c from won't work.
if a, b, and c were .h files, then "#include "b.h"" from a.h would work.
msg186070 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-04-05 09:29
btw, this is the opposite issue to issue #1387483
msg186077 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-04-05 12:14
Adding Guido & Ned, as my recollection is that some of the weirdness in the sys.path[0] symlink resolution was to placate the test suite on Mac OS X (at least, that was a cause of failures in the initial runpy module implementation until Guido tracked down the discrepancy in symlink resolution between direct script execution and runpy).

How does the test suite react if you change the order of application to resolve symlinks only after dropping the file name from the path?
msg186081 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-04-05 13:55
> How does the test suite react if you change the order of application to resolve symlinks only after dropping the file name from the path?

Note that this will break things, see e.g.

The only backward compatible way to handle this would be to add both
directories to sys.path, hoping that there's no module with the same
name in both directories.
msg186085 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-04-05 15:07
Do not "fix" this. It is an intentional feature.

There is a common pattern where one or more Python scripts are collected in some "bin" directory (presumably on the user's $PATH) as symlinks into the directory where they really live (not on $PATH, nor on sys.path). The other files needed by the script(s) are in the latter directory, and so it needs to be on sys.path[0]. If you change the symlink resolution, sys.path[0] will point to the "bin" directory and the scripts won't be able to find the rest of their modules.

While there are probably better patterns to solve the problem that this intends to solve, the pattern is commonly used and I do not want it to be broken.

If you are using symlinks for other purposes, well, too bad.
msg186086 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-04-05 15:30
I'll add it to the list of docs updates for post-PEP 432 (similar to the import system in general finally getting reference docs in 3.3 following the migration to importlib, I hope to have improved import state initialisation docs for 3.4 if I successfully tame the interpreter initialisation code)
msg186087 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-04-05 15:30
1) _I_ am not using symlinks this way.  The hadoop scheduling processor is.  This means that we cannot use Python for it withouth hacking the scripts for the special case.  Presumably applications are not generally breaking when run in an artificial file tree populated with symlinked files into arbitrary real locations, but Python is.  Only Python seems to care about the _real_ location of the file, as opposed to the apparent location.
2) This particular use case is quite unobvious, and goes against the spirit of symbolic links. They are meant to be transparent for applications.  Consider e.g. the analogue to e.g. C header files. Only Python seems to care about the _real_ location of the file, as opposed to the apparent location. Effectively, Python is actively using the knowledge of these links as a sort of dynamic sys.path modifying tool.

I agree that it is not good to break existing usage, however misguided it may be.  But in that case, isn't it possible to disable this symlink dereference via e.g. an option?
msg186089 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-04-05 15:49
Not currently, because interpreter startup is a mess already. Overriding sys.path[0] initialisation is on the list for 3.4 already, I'm just advising strongly against piling any more complexity on top of the current rickety structure until we do something about the foundation.
msg186090 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-04-05 15:52
I'm sure there's some change that can be made to the scripts that
solves this locally, without requiring any changes to Python.
msg186091 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-04-05 16:35
Yes, of course.  But I still maintain that the failure of python to work with a linktree of .py files, where the destination position of said links is arbitrary, is rather unusual, and IMHO violates the principle of least surprise.  In this case, the existence of the virtual linktree is apparently an implementation detail of the hadoop implementation, not something that we as hadoop users were supposed to know or care about.

Exploiting the OS file system implementation detail of a symbolic link as a language import feature is an example of an unusual coupling indeed, in my opinion.

Even import-guru Nick didn't seem to be aware of this feature.  It's great that we plan at least to document this unix-only feature at some point.

msg186117 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-04-06 02:23
The reason I haven't documented sys.path[0] initialisation is because I
know I don't fully understand it. Path initialisation in general has a lot
of historical quirks, particularly once symlinks are involved.
Date User Action Args
2013-04-06 02:23:13ncoghlansetmessages: + msg186117
2013-04-05 16:35:12kristjan.jonssonsetmessages: + msg186091
2013-04-05 15:52:11gvanrossumsetmessages: + msg186090
2013-04-05 15:49:33ncoghlansetmessages: + msg186089
2013-04-05 15:30:49kristjan.jonssonsetmessages: + msg186087
2013-04-05 15:30:36ncoghlansetmessages: + msg186086
2013-04-05 15:07:33gvanrossumsetstatus: open -> closed
resolution: wont fix
messages: + msg186085
2013-04-05 13:55:00neologixsetmessages: + msg186081
title: symlinking .py files creates unexpected sys.path[0] -> symlinking .py files creates unexpected sys.path
2013-04-05 12:22:10schmirsetnosy: + schmir
2013-04-05 12:14:13ncoghlansetnosy: + gvanrossum, ned.deily
messages: + msg186077
2013-04-05 09:29:45kristjan.jonssonsetmessages: + msg186070
2013-04-05 09:29:06kristjan.jonssoncreate