classification
Title: __code__. co_filename should always be an absolute path
Type: behavior Stage: patch review
Components: Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Michel Desmoulin, ammar2, gregory.p.smith, ncoghlan, r.david.murray, vstinner, xtreak
Priority: release blocker Keywords: patch

Created on 2014-01-30 02:15 by yselivanov, last changed 2019-10-24 16:54 by yselivanov.

Pull Requests
URL Status Linked Edit
PR 13527 closed BTaskaya, 2019-05-23 19:20
PR 14053 merged vstinner, 2019-06-13 11:42
PR 14446 merged vstinner, 2019-06-28 14:17
Messages (25)
msg209699 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2014-01-30 02:15
There are many issues on tracker related to the relative paths in co_filename. Most of them are about introspection functions in inspect module--which are usually broken after 'os.chdir'. 

Test case: create a file t.py:

   def foo(): pass
   print(foo.__code__.co_filename)

Execute it with '$ python t.py' -> it will print 't.py'.

Ideally, when executing a python file, interpreter should expand all relative paths for __file__ and __code__.co_filename attributes.
msg345501 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-13 11:43
PR 14053 is a different approach than PR 13527: compute the absolute path to the script filename in PyConfig_Read() just after parsing the command line.
msg345521 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-13 14:30
One of the side effect of my PR 14053 is that tracebacks are more verbose: the filename is now absolute rather than relative.

Currently:

$ python3 x.py 
Traceback (most recent call last):
  File "x.py", line 4, in <module>
    func()
  File "x.py", line 2, in func
    bug
NameError: name 'bug' is not defined

With my PR:

$ ./python x.py 
Traceback (most recent call last):
  File "/home/vstinner/prog/python/master/x.py", line 4, in <module>
    func()
  File "/home/vstinner/prog/python/master/x.py", line 2, in func
    bug
NameError: name 'bug' is not defined
msg345585 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-14 11:13
The site module tries to compute the absolute path of __file__ and __cached__ attributes of all modules in sys.modules:

def abs_paths():
    """Set all module __file__ and __cached__ attributes to an absolute path"""
    for m in set(sys.modules.values()):
        if (getattr(getattr(m, '__loader__', None), '__module__', None) not in
                ('_frozen_importlib', '_frozen_importlib_external')):
            continue   # don't mess with a PEP 302-supplied __file__
        try:
            m.__file__ = os.path.abspath(m.__file__)
        except (AttributeError, OSError, TypeError):
            pass
        try:
            m.__cached__ = os.path.abspath(m.__cached__)
        except (AttributeError, OSError, TypeError):
            pass

The __path__ attribute isn't updated.

Another approach would be to hack importlib to compute the absolute path before loading a module, rather than trying to fix it *afterwards*.

One pratical problem: posixpath and ntpath are not available when importlib is setup, these modules are implemented in pure Python and so must be imported.

Maybe importlib could use a naive implementation of os.path.abspath(). Maybe the C function _Py_abspath() that I implemented in PR 14053 should be exposed somehow to importlib as a private function using a builtin module like _imp, so it can be used directly.
msg346207 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-06-21 11:56
Perhaps `os._abspath_for_import`? If importlib finds it, then it can handle making early paths absolute itself, otherwise it will defer doing that until it has the full external import system bootstrapped. (importlib does try to make all paths absolute as it loads modules - it's just that some early modules aren't loaded that way)
msg346262 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-21 22:35
Effect of my PR 14053 using script.py:

import sys
print(f"{__file__=}")
print(f"{sys.argv=}")
print(f"{sys.path[0]=}")

Before:

$ ./python script.py 
__file__=script.py
sys.argv=['script.py']
sys.path[0]=/home/vstinner/prog/python/master

With the change:

$ ./python script.py 
__file__=/home/vstinner/prog/python/master/script.py
sys.argv=['/home/vstinner/prog/python/master/script.py']
sys.path[0]=/home/vstinner/prog/python/master
msg346524 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-25 13:03
New changeset 3939c321c90283b49eddde762656e4b1940e7150 by Victor Stinner in branch 'master':
bpo-20443: _PyConfig_Read() gets the absolute path of run_filename (GH-14053)
https://github.com/python/cpython/commit/3939c321c90283b49eddde762656e4b1940e7150
msg346789 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-28 00:18
Example of case where a module path is still relative:
---
import sys
import os
modname = 'relpath'
filename = modname + '.py'
sys.path.insert(0, os.curdir)
with open(filename, "w") as fp:
    print("import sys", file=fp)
    print("mod = sys.modules[__name__]", file=fp)
    print("print(f'{__file__=}')", file=fp)
    print("print(f'{mod.__file__=}')", file=fp)
    print("print(f'{mod.__cached__=}')", file=fp)
__import__(modname)
os.unlink(filename)
---

Output:
---
__file__='./relpath.py'
mod.__file__='./relpath.py'
mod.__cached__='./__pycache__/relpath.cpython-39.pyc'
---

__file__ and mod.__file__ are relative paths: not absolute paths.
msg346800 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-06-28 04:46
This change seems to produce a new warning on my Mac.

./Modules/getpath.c:764:23: warning: incompatible pointer types passing 'char [1025]' to parameter of type 'const wchar_t *' (aka 'const int *') [-Wincompatible-pointer-types]
            _Py_isabs(execpath))
                      ^~~~~~~~
./Include/fileutils.h:158:42: note: passing argument to parameter 'path' here
PyAPI_FUNC(int) _Py_isabs(const wchar_t *path);
                                         ^
1 warning generated.
msg346823 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-28 14:49
New changeset 3029035ef34c9bae0c8d965290cd9b273c8de1ea by Victor Stinner in branch 'master':
bpo-20443: Fix calculate_program_full_path() warning (GH-14446)
https://github.com/python/cpython/commit/3029035ef34c9bae0c8d965290cd9b273c8de1ea
msg354925 - (view) Author: Michel Desmoulin (Michel Desmoulin) Date: 2019-10-18 22:29
Isn't that change breaking compat ?

I'm assuming many scripts in the wild sys.argv[0] and play with it assuming it's relative. 

I would expect such a change to be behind a flag or a __future__ import.
msg354999 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-10-20 13:09
I think that's a valid point regarding sys.argv[0] - it's the import system and code introspection that wants(/needs) absolute paths, whereas sys.argv[0] gets used in situations (e.g. usage messages) where we should retain whatever the OS gave us, since that best reflects what the user entered.

That's straightforward to selectively revert, though: remove the "config->run_filename != NULL" clause at https://github.com/python/cpython/blob/24dc2f8c56697f9ee51a4887cf0814b6600c1815/Python/initconfig.c#L2201 and add a comment with the reason for the deliberate omission.


That way the OS-provided argv entry will continue to be passed through to sys.argv[0], while everywhere else will get the absolute path.
msg355055 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-21 11:26
What is the use case for having a relative sys.argv[0]?

> Isn't that change breaking compat ?

Right. It has been made on purpose. The rationale can be found in the first message of this issue.
msg355085 - (view) Author: Michel Desmoulin (Michel Desmoulin) Date: 2019-10-21 17:04
The relevance of the use case isn't the problem. Even if people had been using it wrong for all this time, the update is still going to break their code if they did use it this way. And if it was possible, of course many people did.

3.7 already broke a few packages by choosing to not do the True/False dance again and make async/await keywords. I took a lot of people by surprise, not because of the issue itself, but because of the change of philosophy compared of what Python had been in the past.

Do we really want to make a habit of breaking a few things at every minor release ?

It's fair to expect, as a user, that any 3.x will be aiming to be forward compatible. This was the case for Python 2.X and was part of the popularity of the language: a reliable tech.

I'm advocating that any breaking change,, even the tiniest, should be behind a __future__ import or a flag, with warnings, and then switched on for good on Python 4.0.

Python is a language, not a simple library. Stability is key. Like Linus Torvald use to say: "do not break user space"

The fact this change is easy to fix and migrate is not a proper reason to ignore this important software rule, IMO. So many people and so much code rely on Python that the tinest glitch may have massive impact down the line.

Not to mention that after a decade of healing from the P2/3 transition, the community needs a rest.

Le 21/10/2019 à 13:26, STINNER Victor a écrit :
>
> STINNER Victor <vstinner@python.org> added the comment:
>
> What is the use case for having a relative sys.argv[0]?
>
> > Isn't that change breaking compat ?
>
> Right. It has been made on purpose. The rationale can be found in the first message of this issue.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue20443>
> _______________________________________
>
msg355088 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2019-10-21 17:44
I did a quick search to see what code would break from sys.argv[0] going relative

  intext:"sys.argv[0]" ext:py site:github.com
  https://www.google.com/search?q=intext:"sys.argv%5B0%5D"+ext:py+site:github.com

and while most uses of it are to print usage messages. There is potentially code relying on it being a relative path that will break from this change:

* https://github.com/google/python-gflags/blob/4f06c3d0d6cbe9b1fb90ee9fb1c082b3bf9285f6/gflags/flagvalues.py#L868-L869
* https://github.com/HcashOrg/hcOmniEngine/blob/f1acc2ba3640a8e1c651ddc90a86d569d00704fe/msc-cli.py#L12
* https://github.com/vmtk/vmtk/blob/64675f598e31bc6be3d4fba903fb59bf1394f492/PypeS/pyperun.py#L35
* https://github.com/apache/lucene-solr/blob/cfa49401671b5f9958d46c04120df8c7e3f358be/dev-tools/scripts/svnBranchToGit.py#L783

I think Michel and Nick are correct here and the status-quo should be maintained for sys.argv[0]. Michel should not have to enumerate the use cases for a relative sys.argv[0].
msg355104 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-21 21:29
Nick Coghlan:
> I think that's a valid point regarding sys.argv[0] - it's the import system and code introspection that wants(/needs) absolute paths, whereas sys.argv[0] gets used in situations (e.g. usage messages) where we should retain whatever the OS gave us, since that best reflects what the user entered.

Even before this cases, there were different cases where Python does modify sys.argv:

* "python3 -c code ..." command: Python removes code from sys.argv
* runpy.run_module() and runpy.run_path() replace sys.argv[0] with a filename

At the C level, the Py_GetArgcArgv() function provides the "original" argc and argv: before they are modified.

See open issues:

* bpo-14208 (closed): No way to recover original argv with python -m
* bpo-29857: Provide `sys.executable_argv` for host application's command line arguments
* bpo-26388: Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True)
msg355105 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-21 21:44
I understand that the discussion is about my commit 3939c321c90283b49eddde762656e4b1940e7150 which changed multiple things:
---
  Python now gets the absolute path of the script filename specified on
  the command line (ex: ``python3 script.py``): the ``__file__`` attribute of
  the ``__main__`` module, ``sys.argv[0]`` and ``sys.path[0]`` become an
  absolute path, rather than a relative path.
---

I understand that the complain is not about sys.modules['__main__'].__file__ nor sys.path[0], but only sys.argv[0].

The sys.argv documentation says:

"The list of command line arguments passed to a Python script. argv[0] is the script name (it is operating system dependent whether this is a full pathname or not)."

https://docs.python.org/dev/library/sys.html#sys.argv

I understand that before my change, sys.argv[0] was sometimes relative, and sometimes absolute. With my change, sys.argv[0] should now always be asolute.

There are cases where an absolute path is preferred. There are cases where a relative path is preferred. I'm trying to understand the scope of the issue.

> https://github.com/google/python-gflags/blob/4f06c3d0d6cbe9b1fb90ee9fb1c082b3bf9285f6/gflags/flagvalues.py#L868-L869

I don't know this code. It seems like sys.argv[0] is used to lookup in a dict, and that dict keys are supposed to be "module names". I don't understand in which cases sys.argv[0] could be a module *name*, since sys.argv[0] seems like a relative or absolute path.

> https://github.com/HcashOrg/hcOmniEngine/blob/f1acc2ba3640a8e1c651ddc90a86d569d00704fe/msc-cli.py#L12

The code starts by sys.argv.pop(0): remove sys.argv[0].

> https://github.com/vmtk/vmtk/blob/64675f598e31bc6be3d4fba903fb59bf1394f492/PypeS/pyperun.py#L35

if sys.argv[0].startswith('pyperun'): ...

I understand that my change broke such test. Did this test work before my change when specifying the absolute path to run the script? Like path/to/pyperun.py rather than pyperun.py? Such code looks fragile: using os.path.basename() would be safer here.

> https://github.com/apache/lucene-solr/blob/cfa49401671b5f9958d46c04120df8c7e3f358be/dev-tools/scripts/svnBranchToGit.py#L783

  home = os.path.expanduser("~")
  svnWorkingCopiesPath = os.path.join(home, "svnwork")
  gitReposPath = os.path.join(home, "gitrepos")
  if sys.argv[0].startswith(svnWorkingCopiesPath) 
     or sys.argv[0].startswith(gitReposPath): ...

On my Linux, os.path.expanduser("~") is an absolute path and so svnWorkingCopiesPath and gitReposPath should be absolute paths, no?

My change made this code more reliable, rather than breaking it, no?
msg355106 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-21 21:54
Michel Desmoulin: "The relevance of the use case isn't the problem. Even if people had been using it wrong for all this time, the update is still going to break their code if they did use it this way. And if it was possible, of course many people did."

You have to understand that the change has be done to fix some use cases and that Python 3.8.0 has been related with the change.

If we change again the behavior, we will get Python < 3.8.0, Python 3.8.0 and Python > 3.8.0 behaving differently. I would prefer to fully understand all use cases before starting to discuss changing the behavior again.


> 3.7 already broke a few packages by choosing to not do the True/False dance again and make async/await keywords. I took a lot of people by surprise, not because of the issue itself, but because of the change of philosophy compared of what Python had been in the past.

You have to understand that it was known since the start and it was a deliberate choice after balancing advantages and disadvantages.

https://www.python.org/dev/peps/pep-0492/#transition-plan

I'm only aware of a single function in a single project, Twisted. Are you aware of more projects broken by this change? Did it take long to fix them? Balance it with the ability to use async and await in cases which were not possible in Python 3.6 because they were not real keywords. The Python 3.6 grammar was fully of hack to emulate keywords.


> Do we really want to make a habit of breaking a few things at every minor release ?

Usually, incompatible changes are discussed to balance if it's worth it or not.

There is no plan to break all user code just for your pleasure.

For the specific case of argv[0], the discussion occurred here on associated pull requests, and nobody mentioned any risk of backward compatibility.

FYI I'm working on this topic and I just proposed a the PEP 606 "Python Compatibility Version" to propose one possible solution to make incompatible changes less painful to users.


> It's fair to expect, as a user, that any 3.x will be aiming to be forward compatible. This was the case for Python 2.X and was part of the popularity of the language: a reliable tech.

My PEP 606 tries to explain incompatible changes are needed:
https://www.python.org/dev/peps/pep-0606/#the-need-to-evolve-frequently


> Not to mention that after a decade of healing from the P2/3 transition, the community needs a rest.

I understand that but such comment doesn't help the discussion. See my previous comments and please try to help to fill the missing information to help us to take better decisions.

Since Python 3.8.0 has been release, a revert can make things even worse that the current state.
msg355218 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-10-23 13:35
This change isn't in Python 3.8 though, it's only in Python 3.9 (the PR merge date is after the beta 1 branch date).

Unless it was backported in a Python 3.8 PR that didn't link back to this issue or the original Python 3.9 PR?
msg355219 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-23 14:14
> This change isn't in Python 3.8 though, it's only in Python 3.9 (the PR merge date is after the beta 1 branch date).

Oh, you're right. Sorry, I thought that I made the change before 3.8 branch was created. In that case, we can do whatever we want :-) (We are free to revert.)
msg355278 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-10-24 05:46
Please revert.  An absolute path changes semantics in many real world situations (altering symlink traversals, etc).  People expect the current sys.argv[0] behavior which is "similar to C argv" and matches what was passed on the interpreter command line.

A getcwd() call doesn't even have to succeed.  A single file python program should still be able to run in that environment rather than fail to start.

To help address the original report filing issue, we could add a notion of .co_cwd to code objects for use in resolving co_filename.  Allow it to be '' if getcwd failed at source load time.  Code could check if co_cwd exists and join it with the co_filename.  Also let co_cwd remain empty when there is no valid co_filename for the code.

Preaching: Code that calls os.chdir() is always a potential problem as it alters process global state.  That call is best avoided as modifying globals is impolite.
msg355334 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-24 14:34
> Please revert.

Ok ok, I will revert my change, but only on sys.argv[0].


> A getcwd() call doesn't even have to succeed.  A single file python program should still be able to run in that environment rather than fail to start.

The current implementation leaves a path unchanged if it fails to make it absolute:

config_run_filename_abspath:

    wchar_t *abs_filename;
    if (_Py_abspath(config->run_filename, &abs_filename) < 0) {
        /* failed to get the absolute path of the command line filename:
           ignore the error, keep the relative path */
        return _PyStatus_OK();
    }
    if (abs_filename == NULL) {
        return _PyStatus_NO_MEMORY();
    }

    PyMem_RawFree(config->run_filename);
    config->run_filename = abs_filename;

with:

/* Get an absolute path.
   On error (ex: fail to get the current directory), return -1.
   On memory allocation failure, set *abspath_p to NULL and return 0.
   On success, return a newly allocated to *abspath_p to and return 0.
   The string must be freed by PyMem_RawFree(). */
int _Py_abspath(const wchar_t *path, wchar_t **abspath_p);
msg355335 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-24 14:38
> To help address the original report filing issue, we could add a notion of .co_cwd to code objects for use in resolving co_filename.  Allow it to be '' if getcwd failed at source load time.  Code could check if co_cwd exists and join it with the co_filename.  Also let co_cwd remain empty when there is no valid co_filename for the code.

Do you mean that co_filename attribute of code objects must remain relative?


> Preaching: Code that calls os.chdir() is always a potential problem as it alters process global state.  That call is best avoided as modifying globals is impolite.

I thought that calling os.chdir("/") is a good practice when launching a program as a daemon. And thne call os.fork() twice to detach the process from its parent and run it in background.

I commonly use os.chdir() in my programs for various reasons. A classical use case is to mimick a shell script doing something like (cd $DIRECTORY; command): temporary change the current directory.

cwd parameter of subprocess.Popen helps to avoid changing the currently directory, but sometimes subprocess is not used at all.

Maybe avoiding os.chdir() is a good practice, but it's hard to prevent users to call it. I don't see how os.chdir() is bad by design. I'm not sure that it's really useful to debate this neither :-)
msg355339 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-10-24 16:51
I think sys.argv[0] is the important one as program logic often tends to use that as an input.

I'm honestly unsure of if this will be as much of a problem for .co_filename or sys.path[0].
msg355340 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-10-24 16:53
(read that as: feel free to keep the behavior for co_filename and path[0] and lets see what happens :)
History
Date User Action Args
2019-10-24 16:54:14yselivanovsetnosy: - yselivanov
2019-10-24 16:53:02gregory.p.smithsetmessages: + msg355340
2019-10-24 16:51:34gregory.p.smithsetmessages: + msg355339
2019-10-24 14:38:54vstinnersetmessages: + msg355335
2019-10-24 14:34:07vstinnersetmessages: + msg355334
2019-10-24 05:46:47gregory.p.smithsetpriority: normal -> release blocker
nosy: + gregory.p.smith
messages: + msg355278

2019-10-23 14:14:55vstinnersetmessages: + msg355219
2019-10-23 13:35:16ncoghlansetmessages: + msg355218
2019-10-21 21:54:32vstinnersetmessages: + msg355106
2019-10-21 21:44:31vstinnersetmessages: + msg355105
2019-10-21 21:29:31vstinnersetmessages: + msg355104
2019-10-21 17:44:21ammar2setnosy: + ammar2
messages: + msg355088
2019-10-21 17:04:40Michel Desmoulinsetmessages: + msg355085
2019-10-21 11:26:11vstinnersetmessages: + msg355055
2019-10-20 13:09:00ncoghlansetmessages: + msg354999
versions: + Python 3.9, - Python 3.5
2019-10-18 22:29:43Michel Desmoulinsetnosy: + Michel Desmoulin
messages: + msg354925
2019-06-28 14:49:41vstinnersetmessages: + msg346823
2019-06-28 14:17:51vstinnersetpull_requests: + pull_request14262
2019-06-28 04:46:45xtreaksetnosy: + xtreak
messages: + msg346800
2019-06-28 00:18:23vstinnersetmessages: + msg346789
2019-06-25 13:03:08vstinnersetmessages: + msg346524
2019-06-21 22:35:17vstinnersetmessages: + msg346262
2019-06-21 11:56:15ncoghlansetmessages: + msg346207
2019-06-14 11:13:33vstinnersetmessages: + msg345585
2019-06-13 14:30:45vstinnersetmessages: + msg345521
2019-06-13 11:43:55vstinnersetnosy: + vstinner
messages: + msg345501
2019-06-13 11:42:55vstinnersetpull_requests: + pull_request13915
2019-05-23 19:20:53BTaskayasetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request13441
2014-01-30 02:15:56yselivanovsetnosy: + ncoghlan
2014-01-30 02:15:15yselivanovcreate