Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__code__. co_filename should always be an absolute path #64642

Closed
1st1 opened this issue Jan 30, 2014 · 34 comments
Closed

__code__. co_filename should always be an absolute path #64642

1st1 opened this issue Jan 30, 2014 · 34 comments
Labels
3.9 only security fixes type-bug An unexpected behavior, bug, or error

Comments

@1st1
Copy link
Member

1st1 commented Jan 30, 2014

BPO 20443
Nosy @gpshead, @ncoghlan, @vstinner, @bitdancer, @ambv, @ammaraskar, @tirkarthi, @isidentical
PRs
  • bpo-20443: make code objects filename an absolute path #13527
  • bpo-20443: _PyConfig_Read() gets the absolute path of run_filename #14053
  • bpo-20443: Fix calculate_program_full_path() warning #14446
  • bpo-20443: No longer make sys.argv[0] absolute for script #17534
  • bpo-20443: Update What's New In Python 3.9 #17986
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-12-31.04:56:03.820>
    created_at = <Date 2014-01-30.02:15:15.725>
    labels = ['type-bug', '3.9']
    title = '__code__. co_filename should always be an absolute path'
    updated_at = <Date 2020-01-13.13:57:20.518>
    user = 'https://github.com/1st1'

    bugs.python.org fields:

    activity = <Date 2020-01-13.13:57:20.518>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-12-31.04:56:03.820>
    closer = 'ncoghlan'
    components = []
    creation = <Date 2014-01-30.02:15:15.725>
    creator = 'yselivanov'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 20443
    keywords = ['patch']
    message_count = 32.0
    messages = ['209699', '345501', '345521', '345585', '346207', '346262', '346524', '346789', '346800', '346823', '354925', '354999', '355055', '355085', '355088', '355104', '355105', '355106', '355218', '355219', '355278', '355334', '355335', '355339', '355340', '358094', '358114', '358116', '358128', '358661', '359098', '359902']
    nosy_count = 9.0
    nosy_names = ['gregory.p.smith', 'ncoghlan', 'vstinner', 'r.david.murray', 'lukasz.langa', 'Michel Desmoulin', 'ammar2', 'xtreak', 'BTaskaya']
    pr_nums = ['13527', '14053', '14446', '17534', '17986']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue20443'
    versions = ['Python 3.9']

    @1st1
    Copy link
    Member Author

    1st1 commented Jan 30, 2014

    There are many issues on tracker related to the relative paths in co_filename. Most of them are about introspection functions in inspect module--which are usually broken after 'os.chdir'.

    Test case: create a file t.py:

       def foo(): pass
       print(foo.__code__.co_filename)

    Execute it with '$ python t.py' -> it will print 't.py'.

    Ideally, when executing a python file, interpreter should expand all relative paths for __file__ and __code__.co_filename attributes.

    @1st1 1st1 added the type-bug An unexpected behavior, bug, or error label Jan 30, 2014
    @vstinner
    Copy link
    Member

    PR 14053 is a different approach than PR 13527: compute the absolute path to the script filename in PyConfig_Read() just after parsing the command line.

    @vstinner
    Copy link
    Member

    One of the side effect of my PR 14053 is that tracebacks are more verbose: the filename is now absolute rather than relative.

    Currently:

    $ python3 x.py 
    Traceback (most recent call last):
      File "x.py", line 4, in <module>
        func()
      File "x.py", line 2, in func
        bug
    NameError: name 'bug' is not defined

    With my PR:

    $ ./python x.py 
    Traceback (most recent call last):
      File "/home/vstinner/prog/python/master/x.py", line 4, in <module>
        func()
      File "/home/vstinner/prog/python/master/x.py", line 2, in func
        bug
    NameError: name 'bug' is not defined

    @vstinner
    Copy link
    Member

    The site module tries to compute the absolute path of __file__ and __cached__ attributes of all modules in sys.modules:

    def abs_paths():
        """Set all module __file__ and __cached__ attributes to an absolute path"""
        for m in set(sys.modules.values()):
            if (getattr(getattr(m, '__loader__', None), '__module__', None) not in
                    ('_frozen_importlib', '_frozen_importlib_external')):
                continue   # don't mess with a PEP 302-supplied __file__
            try:
                m.__file__ = os.path.abspath(m.__file__)
            except (AttributeError, OSError, TypeError):
                pass
            try:
                m.__cached__ = os.path.abspath(m.__cached__)
            except (AttributeError, OSError, TypeError):
                pass

    The __path__ attribute isn't updated.

    Another approach would be to hack importlib to compute the absolute path before loading a module, rather than trying to fix it *afterwards*.

    One pratical problem: posixpath and ntpath are not available when importlib is setup, these modules are implemented in pure Python and so must be imported.

    Maybe importlib could use a naive implementation of os.path.abspath(). Maybe the C function _Py_abspath() that I implemented in PR 14053 should be exposed somehow to importlib as a private function using a builtin module like _imp, so it can be used directly.

    @ncoghlan
    Copy link
    Contributor

    Perhaps os._abspath_for_import? If importlib finds it, then it can handle making early paths absolute itself, otherwise it will defer doing that until it has the full external import system bootstrapped. (importlib does try to make all paths absolute as it loads modules - it's just that some early modules aren't loaded that way)

    @vstinner
    Copy link
    Member

    Effect of my PR 14053 using script.py:

    import sys
    print(f"{__file__=}")
    print(f"{sys.argv=}")
    print(f"{sys.path[0]=}")

    Before:

    $ ./python script.py 
    __file__=script.py
    sys.argv=['script.py']
    sys.path[0]=/home/vstinner/prog/python/master

    With the change:

    $ ./python script.py 
    __file__=/home/vstinner/prog/python/master/script.py
    sys.argv=['/home/vstinner/prog/python/master/script.py']
    sys.path[0]=/home/vstinner/prog/python/master

    @vstinner
    Copy link
    Member

    New changeset 3939c32 by Victor Stinner in branch 'master':
    bpo-20443: _PyConfig_Read() gets the absolute path of run_filename (GH-14053)
    3939c32

    @vstinner
    Copy link
    Member

    Example of case where a module path is still relative:
    ---

    import sys
    import os
    modname = 'relpath'
    filename = modname + '.py'
    sys.path.insert(0, os.curdir)
    with open(filename, "w") as fp:
        print("import sys", file=fp)
        print("mod = sys.modules[__name__]", file=fp)
        print("print(f'{__file__=}')", file=fp)
        print("print(f'{mod.__file__=}')", file=fp)
        print("print(f'{mod.__cached__=}')", file=fp)
    __import__(modname)
    os.unlink(filename)

    Output:
    ---
    __file__='./relpath.py'
    mod.__file__='./relpath.py'
    mod.__cached__='./pycache/relpath.cpython-39.pyc'
    ---

    __file__ and mod.__file__ are relative paths: not absolute paths.

    @tirkarthi
    Copy link
    Member

    This change seems to produce a new warning on my Mac.

    ./Modules/getpath.c:764:23: warning: incompatible pointer types passing 'char [1025]' to parameter of type 'const wchar_t *' (aka 'const int *') [-Wincompatible-pointer-types]
                _Py_isabs(execpath))
                          ^~~~~~~~
    ./Include/fileutils.h:158:42: note: passing argument to parameter 'path' here
    PyAPI_FUNC(int) _Py_isabs(const wchar_t *path);
                                             ^
    1 warning generated.

    @vstinner
    Copy link
    Member

    New changeset 3029035 by Victor Stinner in branch 'master':
    bpo-20443: Fix calculate_program_full_path() warning (GH-14446)
    3029035

    @MichelDesmoulin
    Copy link
    Mannequin

    MichelDesmoulin mannequin commented Oct 18, 2019

    Isn't that change breaking compat ?

    I'm assuming many scripts in the wild sys.argv[0] and play with it assuming it's relative.

    I would expect such a change to be behind a flag or a __future__ import.

    @ncoghlan
    Copy link
    Contributor

    I think that's a valid point regarding sys.argv[0] - it's the import system and code introspection that wants(/needs) absolute paths, whereas sys.argv[0] gets used in situations (e.g. usage messages) where we should retain whatever the OS gave us, since that best reflects what the user entered.

    That's straightforward to selectively revert, though: remove the "config->run_filename != NULL" clause at

    else if (config->run_filename != NULL) {
    and add a comment with the reason for the deliberate omission.

    That way the OS-provided argv entry will continue to be passed through to sys.argv[0], while everywhere else will get the absolute path.

    @ncoghlan ncoghlan added the 3.9 only security fixes label Oct 20, 2019
    @vstinner
    Copy link
    Member

    What is the use case for having a relative sys.argv[0]?

    Isn't that change breaking compat ?

    Right. It has been made on purpose. The rationale can be found in the first message of this issue.

    @MichelDesmoulin
    Copy link
    Mannequin

    MichelDesmoulin mannequin commented Oct 21, 2019

    The relevance of the use case isn't the problem. Even if people had been using it wrong for all this time, the update is still going to break their code if they did use it this way. And if it was possible, of course many people did.

    3.7 already broke a few packages by choosing to not do the True/False dance again and make async/await keywords. I took a lot of people by surprise, not because of the issue itself, but because of the change of philosophy compared of what Python had been in the past.

    Do we really want to make a habit of breaking a few things at every minor release ?

    It's fair to expect, as a user, that any 3.x will be aiming to be forward compatible. This was the case for Python 2.X and was part of the popularity of the language: a reliable tech.

    I'm advocating that any breaking change,, even the tiniest, should be behind a __future__ import or a flag, with warnings, and then switched on for good on Python 4.0.

    Python is a language, not a simple library. Stability is key. Like Linus Torvald use to say: "do not break user space"

    The fact this change is easy to fix and migrate is not a proper reason to ignore this important software rule, IMO. So many people and so much code rely on Python that the tinest glitch may have massive impact down the line.

    Not to mention that after a decade of healing from the P2/3 transition, the community needs a rest.

    Le 21/10/2019 à 13:26, STINNER Victor a écrit :

    STINNER Victor <vstinner@python.org> added the comment:

    What is the use case for having a relative sys.argv[0]?

    > Isn't that change breaking compat ?

    Right. It has been made on purpose. The rationale can be found in the first message of this issue.

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue20443\>


    @ammaraskar
    Copy link
    Member

    I did a quick search to see what code would break from sys.argv[0] going relative

    intext:"sys.argv[0]" ext:py site:github.com
    https://www.google.com/search?q=intext:"sys.argv%5B0%5D"+ext:py+site:github.com

    and while most uses of it are to print usage messages. There is potentially code relying on it being a relative path that will break from this change:

    I think Michel and Nick are correct here and the status-quo should be maintained for sys.argv[0]. Michel should not have to enumerate the use cases for a relative sys.argv[0].

    @vstinner
    Copy link
    Member

    Nick Coghlan:

    I think that's a valid point regarding sys.argv[0] - it's the import system and code introspection that wants(/needs) absolute paths, whereas sys.argv[0] gets used in situations (e.g. usage messages) where we should retain whatever the OS gave us, since that best reflects what the user entered.

    Even before this cases, there were different cases where Python does modify sys.argv:

    • "python3 -c code ..." command: Python removes code from sys.argv
    • runpy.run_module() and runpy.run_path() replace sys.argv[0] with a filename

    At the C level, the Py_GetArgcArgv() function provides the "original" argc and argv: before they are modified.

    See open issues:

    • bpo-14208 (closed): No way to recover original argv with python -m
    • bpo-29857: Provide sys.executable_argv for host application's command line arguments
    • bpo-26388: Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True)

    @vstinner
    Copy link
    Member

    I understand that the discussion is about my commit 3939c32 which changed multiple things:
    ---
    Python now gets the absolute path of the script filename specified on
    the command line (ex: python3 script.py): the __file__ attribute of
    the __main__ module, sys.argv[0] and sys.path[0] become an
    absolute path, rather than a relative path.
    ---

    I understand that the complain is not about sys.modules['__main__'].__file__ nor sys.path[0], but only sys.argv[0].

    The sys.argv documentation says:

    "The list of command line arguments passed to a Python script. argv[0] is the script name (it is operating system dependent whether this is a full pathname or not)."

    https://docs.python.org/dev/library/sys.html#sys.argv

    I understand that before my change, sys.argv[0] was sometimes relative, and sometimes absolute. With my change, sys.argv[0] should now always be asolute.

    There are cases where an absolute path is preferred. There are cases where a relative path is preferred. I'm trying to understand the scope of the issue.

    https://github.com/google/python-gflags/blob/4f06c3d0d6cbe9b1fb90ee9fb1c082b3bf9285f6/gflags/flagvalues.py#L868-L869

    I don't know this code. It seems like sys.argv[0] is used to lookup in a dict, and that dict keys are supposed to be "module names". I don't understand in which cases sys.argv[0] could be a module *name*, since sys.argv[0] seems like a relative or absolute path.

    https://github.com/HcashOrg/hcOmniEngine/blob/f1acc2ba3640a8e1c651ddc90a86d569d00704fe/msc-cli.py#L12

    The code starts by sys.argv.pop(0): remove sys.argv[0].

    https://github.com/vmtk/vmtk/blob/64675f598e31bc6be3d4fba903fb59bf1394f492/PypeS/pyperun.py#L35

    if sys.argv[0].startswith('pyperun'): ...

    I understand that my change broke such test. Did this test work before my change when specifying the absolute path to run the script? Like path/to/pyperun.py rather than pyperun.py? Such code looks fragile: using os.path.basename() would be safer here.

    https://github.com/apache/lucene-solr/blob/cfa49401671b5f9958d46c04120df8c7e3f358be/dev-tools/scripts/svnBranchToGit.py#L783

      home = os.path.expanduser("~")
      svnWorkingCopiesPath = os.path.join(home, "svnwork")
      gitReposPath = os.path.join(home, "gitrepos")
      if sys.argv[0].startswith(svnWorkingCopiesPath) 
         or sys.argv[0].startswith(gitReposPath): ...

    On my Linux, os.path.expanduser("~") is an absolute path and so svnWorkingCopiesPath and gitReposPath should be absolute paths, no?

    My change made this code more reliable, rather than breaking it, no?

    @vstinner
    Copy link
    Member

    Michel Desmoulin: "The relevance of the use case isn't the problem. Even if people had been using it wrong for all this time, the update is still going to break their code if they did use it this way. And if it was possible, of course many people did."

    You have to understand that the change has be done to fix some use cases and that Python 3.8.0 has been related with the change.

    If we change again the behavior, we will get Python < 3.8.0, Python 3.8.0 and Python > 3.8.0 behaving differently. I would prefer to fully understand all use cases before starting to discuss changing the behavior again.

    3.7 already broke a few packages by choosing to not do the True/False dance again and make async/await keywords. I took a lot of people by surprise, not because of the issue itself, but because of the change of philosophy compared of what Python had been in the past.

    You have to understand that it was known since the start and it was a deliberate choice after balancing advantages and disadvantages.

    https://www.python.org/dev/peps/pep-0492/#transition-plan

    I'm only aware of a single function in a single project, Twisted. Are you aware of more projects broken by this change? Did it take long to fix them? Balance it with the ability to use async and await in cases which were not possible in Python 3.6 because they were not real keywords. The Python 3.6 grammar was fully of hack to emulate keywords.

    Do we really want to make a habit of breaking a few things at every minor release ?

    Usually, incompatible changes are discussed to balance if it's worth it or not.

    There is no plan to break all user code just for your pleasure.

    For the specific case of argv[0], the discussion occurred here on associated pull requests, and nobody mentioned any risk of backward compatibility.

    FYI I'm working on this topic and I just proposed a the PEP-606 "Python Compatibility Version" to propose one possible solution to make incompatible changes less painful to users.

    It's fair to expect, as a user, that any 3.x will be aiming to be forward compatible. This was the case for Python 2.X and was part of the popularity of the language: a reliable tech.

    My PEP-606 tries to explain incompatible changes are needed:
    https://www.python.org/dev/peps/pep-0606/#the-need-to-evolve-frequently

    Not to mention that after a decade of healing from the P2/3 transition, the community needs a rest.

    I understand that but such comment doesn't help the discussion. See my previous comments and please try to help to fill the missing information to help us to take better decisions.

    Since Python 3.8.0 has been release, a revert can make things even worse that the current state.

    @ncoghlan
    Copy link
    Contributor

    This change isn't in Python 3.8 though, it's only in Python 3.9 (the PR merge date is after the beta 1 branch date).

    Unless it was backported in a Python 3.8 PR that didn't link back to this issue or the original Python 3.9 PR?

    @vstinner
    Copy link
    Member

    This change isn't in Python 3.8 though, it's only in Python 3.9 (the PR merge date is after the beta 1 branch date).

    Oh, you're right. Sorry, I thought that I made the change before 3.8 branch was created. In that case, we can do whatever we want :-) (We are free to revert.)

    @gpshead
    Copy link
    Member

    gpshead commented Oct 24, 2019

    Please revert. An absolute path changes semantics in many real world situations (altering symlink traversals, etc). People expect the current sys.argv[0] behavior which is "similar to C argv" and matches what was passed on the interpreter command line.

    A getcwd() call doesn't even have to succeed. A single file python program should still be able to run in that environment rather than fail to start.

    To help address the original report filing issue, we could add a notion of .co_cwd to code objects for use in resolving co_filename. Allow it to be '' if getcwd failed at source load time. Code could check if co_cwd exists and join it with the co_filename. Also let co_cwd remain empty when there is no valid co_filename for the code.

    Preaching: Code that calls os.chdir() is always a potential problem as it alters process global state. That call is best avoided as modifying globals is impolite.

    @vstinner
    Copy link
    Member

    Please revert.

    Ok ok, I will revert my change, but only on sys.argv[0].

    A getcwd() call doesn't even have to succeed. A single file python program should still be able to run in that environment rather than fail to start.

    The current implementation leaves a path unchanged if it fails to make it absolute:

    config_run_filename_abspath:

        wchar_t *abs_filename;
        if (_Py_abspath(config->run_filename, &abs_filename) < 0) {
            /* failed to get the absolute path of the command line filename:
               ignore the error, keep the relative path */
            return _PyStatus_OK();
        }
        if (abs_filename == NULL) {
            return _PyStatus_NO_MEMORY();
        }
    
        PyMem_RawFree(config->run_filename);
        config->run_filename = abs_filename;

    with:

    /* Get an absolute path.
       On error (ex: fail to get the current directory), return -1.
       On memory allocation failure, set *abspath_p to NULL and return 0.
       On success, return a newly allocated to *abspath_p to and return 0.
       The string must be freed by PyMem_RawFree(). */
    int _Py_abspath(const wchar_t *path, wchar_t **abspath_p);

    @vstinner
    Copy link
    Member

    To help address the original report filing issue, we could add a notion of .co_cwd to code objects for use in resolving co_filename. Allow it to be '' if getcwd failed at source load time. Code could check if co_cwd exists and join it with the co_filename. Also let co_cwd remain empty when there is no valid co_filename for the code.

    Do you mean that co_filename attribute of code objects must remain relative?

    Preaching: Code that calls os.chdir() is always a potential problem as it alters process global state. That call is best avoided as modifying globals is impolite.

    I thought that calling os.chdir("/") is a good practice when launching a program as a daemon. And thne call os.fork() twice to detach the process from its parent and run it in background.

    I commonly use os.chdir() in my programs for various reasons. A classical use case is to mimick a shell script doing something like (cd $DIRECTORY; command): temporary change the current directory.

    cwd parameter of subprocess.Popen helps to avoid changing the currently directory, but sometimes subprocess is not used at all.

    Maybe avoiding os.chdir() is a good practice, but it's hard to prevent users to call it. I don't see how os.chdir() is bad by design. I'm not sure that it's really useful to debate this neither :-)

    @gpshead
    Copy link
    Member

    gpshead commented Oct 24, 2019

    I think sys.argv[0] is the important one as program logic often tends to use that as an input.

    I'm honestly unsure of if this will be as much of a problem for .co_filename or sys.path[0].

    @gpshead
    Copy link
    Member

    gpshead commented Oct 24, 2019

    (read that as: feel free to keep the behavior for co_filename and path[0] and lets see what happens :)

    @ambv
    Copy link
    Contributor

    ambv commented Dec 9, 2019

    Anything new here? 3.9.0a2 is due next week.

    @vstinner
    Copy link
    Member

    vstinner commented Dec 9, 2019

    Anything new here? 3.9.0a2 is due next week.

    I had it in my TODO list but I lost the bpo number. Thanks for the reminder :-) I wrote PR 17534 that I plan to merge as soon as the CI tests pass.

    @vstinner
    Copy link
    Member

    vstinner commented Dec 9, 2019

    New changeset a1a99b4 by Victor Stinner in branch 'master':
    bpo-20443: No longer make sys.argv[0] absolute for script (GH-17534)
    a1a99b4

    @isidentical
    Copy link
    Sponsor Member

    I am not sure that if co_filename can break backwards compability or not (because until CodeType.replace; it was usual to break code object construction, people who are using code objects may take this too) but yes PR 17534 was necessary.

    My initial patch (PR 13527) proposed a future directive to do this (and poorly implemented the concept) which can be helpful (yet it needs a PEP).

    @vstinner
    Copy link
    Member

    I reverted my change, so I remove "release blocker" priority.

    @ncoghlan
    Copy link
    Contributor

    With the sys.argv[0] change reverted, I think this overall issue is fixed now - code objects will get absolute paths, while sys.argv[0] will continue to reflect how __main__ was identified.

    @vstinner
    Copy link
    Member

    New changeset c1ee6e5 by Victor Stinner in branch 'master':
    bpo-20443: Update What's New In Python 3.9 (GH-17986)
    c1ee6e5

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @bvd0
    Copy link

    bvd0 commented Sep 6, 2023

    This says that Module __file__ attributes (and related values) should now always contain absolute paths by default, with the sole exception of __main__.__file__, and this says that the __file__ attribute of the __main__ module became an absolute path; does this mean that __file__ is always an absolute path?

    Is there anything in the official Python documentation that specifies whether __file__ is absolute or relative?

    @vstinner
    Copy link
    Member

    vstinner commented Sep 6, 2023

    This issue is closed. I suggest you to open a new issue.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    9 participants