Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate and remove code execution in pth files #78125

Open
warsaw opened this issue Jun 22, 2018 · 121 comments
Open

Deprecate and remove code execution in pth files #78125

warsaw opened this issue Jun 22, 2018 · 121 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@warsaw
Copy link
Member

warsaw commented Jun 22, 2018

BPO 33944
Nosy @mhammond, @warsaw, @brettcannon, @terryjreedy, @jaraco, @ncoghlan, @pitrou, @ericvsmith, @tiran, @nedbat, @aroberge, @methane, @ericsnowcurrently, @takluyver, @zooba, @matrixise, @vedgar, @native-api, @yan12125, @asottile, @ethanhs, @csabella, @miss-islington, @chrisjbillington, @Qix-
PRs
  • bpo-33944: note about the intended use of code in .pth files #10131
  • bpo-33944: Add site.py debugging via PYTHONSITEDEBUG #12107
  • bpo-33944: Add site.py site-packages tracing in verbose mode #12110
  • [3.8] bpo-33944: note about the intended use of code in .pth files (GH-10131) #15942
  • Dependencies
  • bpo-14803: Add feature to allow code execution prior to main invocation
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2018-06-22.17:22:20.335>
    labels = ['3.8', 'type-feature', 'library']
    title = 'Deprecate and remove code execution in pth files'
    updated_at = <Date 2021-01-05.10:12:30.573>
    user = 'https://github.com/warsaw'

    bugs.python.org fields:

    activity = <Date 2021-01-05.10:12:30.573>
    actor = 'lkollar'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2018-06-22.17:22:20.335>
    creator = 'barry'
    dependencies = ['14803']
    files = []
    hgrepos = []
    issue_num = 33944
    keywords = ['patch']
    message_count = 120.0
    messages = ['320246', '320249', '320253', '320266', '320277', '320279', '320283', '320284', '320286', '320287', '320292', '320293', '320342', '320386', '320393', '320724', '320754', '320850', '320997', '321005', '321026', '321125', '321134', '321340', '328488', '328564', '329607', '329764', '329802', '330115', '333235', '333536', '333567', '333568', '333569', '333572', '333591', '333592', '333613', '333637', '333638', '333639', '333640', '333642', '333644', '333645', '333698', '333699', '333705', '333706', '333716', '333997', '334199', '335774', '335926', '336351', '336662', '336705', '336709', '336710', '336711', '336714', '336716', '336721', '336722', '336725', '336726', '336809', '336853', '336856', '336860', '336863', '336875', '336882', '336939', '336944', '336961', '336970', '336983', '336984', '336992', '337064', '337351', '337353', '337354', '337365', '337368', '337370', '337396', '337399', '337406', '337408', '337409', '337410', '337414', '337417', '337418', '337421', '337422', '337424', '337426', '337427', '337430', '337434', '337437', '337438', '337439', '337446', '337920', '337954', '350625', '351861', '351872', '358909', '358915', '358953', '368712', '368732', '371334', '384148']
    nosy_count = 31.0
    nosy_names = ['mhammond', 'barry', 'brett.cannon', 'terry.reedy', 'jaraco', 'ncoghlan', 'pitrou', 'eric.smith', 'christian.heimes', 'nedbat', 'aroberge', 'ionelmc', 'methane', 'SilentGhost', '__Vano', 'eric.snow', 'takluyver', 'steve.dower', 'matrixise', 'veky', 'Ivan.Pozdeev', 'yan12125', 'Anthony Sottile', 'Michel Desmoulin', 'ethan smith', 'cheryl.sabella', 'lkollar', 'miss-islington', 'Chris Billington', 'Peter L3', 'qix-']
    pr_nums = ['10131', '12107', '12110', '15942']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue33944'
    versions = ['Python 3.8']

    @warsaw
    Copy link
    Member Author

    warsaw commented Jun 22, 2018

    pth files are evil. They are very difficult to debug because they're processed too early. They usually contain globs of inscrutable code. Exceptions in pth files can get swallowed in some cases. They are loaded in indeterminate order.

    They are also unnecessary to support namespace packages in Python 3 (ignoring straddling code).

    Let's start the process for removing them.

    1. Deprecate pth files in Python 3.8 and turn them off with the -3 option.

    2. Kill off pth file support once Python 2 is EOL'd.

    @warsaw warsaw added 3.8 only security fixes stdlib Python modules in the Lib dir labels Jun 22, 2018
    @tiran
    Copy link
    Member

    tiran commented Jun 22, 2018

    +1

    @ericvsmith
    Copy link
    Member

    Also +1.

    @takluyver
    Copy link
    Mannequin

    takluyver mannequin commented Jun 22, 2018

    I'm generally in favour of getting rid of .pth files. But I did accept a PR adding support for them in Flit to act as a substitute for symlinks on Windows, to achieve something like a 'development install'. I'm not sure what the alternative is if they go away.

    @brettcannon
    Copy link
    Member

    Windows has symlinks now I believe, you just have to turn them on.

    And I would say there is no need for alternative. If a package needs to do something funky they can do it in their __init__.py file. Otherwise if I don't import a package it shouldn't get to do anything crazy through a .pth file.

    @takluyver
    Copy link
    Mannequin

    takluyver mannequin commented Jun 22, 2018

    I don't want to use the execution features of .pth files, just their original functionality of adding extra directories to sys.path. I'd be very happy to see the arbitrary code execution 'feature' of .pth files go away.

    Windows supports symlinks, but the last I heard was that creating them requires some obscure permission bit. It seems to be awkward enough that Windows users aren't happy with the "just use symlinks" approach, which was what I was originally trying.

    @ericvsmith
    Copy link
    Member

    My understanding about symlinks on Windows is that they require a permission ("Create symbolic links"), that normal users by default do not have. I'm not sure if this has changed recently.

    @ethanhs
    Copy link
    Mannequin

    ethanhs mannequin commented Jun 22, 2018

    I am in favor of symlinks no longer being able to execute arbitrary code, however, I do think having them add to the path cannot be killed in two releases. Here is why:

    1. Windows support for symlinks is still not automatic. In the creators update of Windows 10 (released March 2017), CreateSymbolicLink added a dwflag SYMBOLIC_LINK_FLAG_ALLOW_UNPRIVILEGED_CREATE. This requires the user to be in developer mode to work. CPython currently doesn't use this flag. (I will open an issue to add that in a moment). I worry that giving people little time to update will be troublesome.

    2. All editable installs everywhere (AFAIK) and setuptools eggs (still somewhat common) use easy-install.pth to list where they are. I think breaking editable installs is a bad idea, as there is no clear solution for this. Also setuptools has a fair amount of work to do before it can replace egg installs.

    So I think removing adding to the path will require much more thought and break a lot more code than removing arbitrary code execution.

    @brettcannon
    Copy link
    Member

    My only answer to Ethan is "don't use eggs". :)

    @warsaw
    Copy link
    Member Author

    warsaw commented Jun 22, 2018

    There are lots of problems with pth files, although arbitrary code execution is probably the most egregious. They are also notoriously difficult to debug, and happen before any control is given to user code. They certainly are unnecessary for namespace packages, which I think they currently get used for often in Py 2/3 straddling code.

    Maybe it will be okay to just fall back to sys.path extension, but I'd like to have a better understanding of exactly what the use cases are (in a pure Python 3 world), and we have to address the other problems about discovery and debuggability.

    @ncoghlan
    Copy link
    Contributor

    Strong -1 without a functional replacement that provides comparable LD_PRELOAD capabilities (it also needs a full PEP that analyses all of the ways that setuptools and other packaging utilities use these files, such as for the implementation of "develop" mode, and the processing of ".lnk" mode).

    This change also needs to account for the Windows-only "._pth" files that override the path completely.

    The main discussion list for such a PEP should be distutils-sig, *not* python-ideas or import-sig (since distutils-sig is where we're more likely to find folks that are actually relying on the feature, and hence have a clearer idea of what will need to change to maintain a comparable level of ecosystem level capability).

    https://bugs.python.org/issue14803 is also related, as pth file processing should at least be delayed to run later than it does currently, and because "run code at startup" is one of the capabilities that would need replacing.

    @ncoghlan
    Copy link
    Contributor

    Concrete use case for the original path extension capability: "pew add", which chains virtual environments together (allowing shared environments with a common default dependency set, and then additional per-application dependencies)

    @ncoghlan
    Copy link
    Contributor

    Brett pointed out that may initial reaction above came across as quite blunt and demanding, so attempting to phrase that more clearly as a user experience consideration:

    It may be tempting to view this as purely a clean-up of the import system implementation, removing a quirky and error prone construct for the sake of improved maintainability of both the import system itself, and the maintainability of end user installations.

    My request (wearing my "BDFL-delegate for packaging interoperability standards" hat) is that proponents of the change resist the temptation to view the problem that way :)

    Path files are used extensively across the Python packaging ecosystem to implement additional environment management features beyond those provided natively by interpreter implementations, and while we've added native equivalents for some of them (namespace packages, virtual environments), we're far from having added support for all of them (dynamic package version selection, virtual environment chaining, editable package installs that still publish correct PEP-376 package metadata, etc).

    This means that any changes in this area pose significant backwards compatibility risks, and need to be approached carefully, and cautiously, with a strong emphasis on surveying real world code and seeing how the feature is currently being used.

    Or, alternatively, the idea can be broken up into smaller, lower impact changes that still help to address the import system and end user environment maintainability issues, but don't involve breaking backwards compatibility.

    (For an example of the latter: if "python -m site --list-pth-files" printed a list of all of the pth files and "python -m site --dump-pth-files" listed both the files and their contents, then environment debuggability would improve significantly without any compatibility impacts whatsoever)

    @pitrou
    Copy link
    Member

    pitrou commented Jun 24, 2018

    I would also add that editable installs should not break in the process. They are important.

    @warsaw
    Copy link
    Member Author

    warsaw commented Jun 25, 2018

    On Jun 23, 2018, at 18:56, Nick Coghlan <report@bugs.python.org> wrote:

    My request (wearing my "BDFL-delegate for packaging interoperability standards" hat) is that proponents of the change resist the temptation to view the problem that way :)

    Path files are used extensively across the Python packaging ecosystem to implement additional environment management features beyond those provided natively by interpreter implementations, and while we've added native equivalents for some of them (namespace packages, virtual environments), we're far from having added support for all of them (dynamic package version selection, virtual environment chaining, editable package installs that still publish correct PEP-376 package metadata, etc).

    Still, I firmly believe they’re a wart being abused for purposes they weren’t really intended for. It’s a trick of implementation that lines beginning with import are exec’d. That being said…

    Or, alternatively, the idea can be broken up into smaller, lower impact changes that still help to address the import system and end user environment maintainability issues, but don't involve breaking backwards compatibility.

    +1 on working on *much* better debuggability and discoverability for .pth files first, and then consider their eventual deprecation, replacement, and/or removal.

    @native-api
    Copy link
    Mannequin

    native-api mannequin commented Jun 29, 2018

    I *think* we need to ask maintainers of packages who use .pth -- at least, Mark Hammond (pywin32) -- to find out the impact and if everything can be done with other means.

    AFAICS it at least allows pywin32 to have many top-level modules without cluttering `site-packages'.

    pywin32 e.g. also copies some files to %windir%\system32 for some reason. And last time I checked, distutils had no functionality that involved symlinks, regardless of the OS.

    @native-api native-api mannequin added the type-feature A feature request or enhancement label Jun 29, 2018
    @ncoghlan
    Copy link
    Contributor

    I think we also need to clearly separate two distinct aspects of .pth files:

    1. "import <module>; <arbitrary code execution goes here>" lines <--- Kill it with fire
    2. "<add this directory to sys.path>" lines <--- This is fine and good and perfectly sensible

    It's point 2 that powers things like "pew add", and I don't see any particularly compelling reason to get rid of it.

    The "arbitrary code invocation for every single Python execution using that environment" aspect, on the other hand, is mostly a PITA, and used as a workaround for other features being missing (e.g. the PYTHONRUNFIRST proposal in https://bugs.python.org/issue14803).

    @mhammond
    Copy link
    Contributor

    mhammond commented Jul 2, 2018

    pywin32, up until recently, just listed 3 directories in its .pth file - these were for directories which pre-dated packages and were never converted. Eg, "import win32api" actually loads win32api.pyd from the "site-packages/win32" directory.

    Earlier this year, via mhammond/pywin32#1151, I also added the line:

    import os;os.environ["PATH"]+=(';'+os.path.join(sitedir,"pywin32_system32"))

    which is to support pywin32 being installed from wheels - this is due to pywin32 shipping with various shared DLLs which implement many pywin32 types - eg, pywintypesXX.dll is used by (almost) every single .pyd shipped with pywin32, and disutils doesn't offer any way of copying files as part of a post-install script or any other way of ensuring these .dll files are on the PATH or otherwise next to pythonXX.dll/.exe

    I'm happy to replace both of these with alternatives when they exist.

    @warsaw
    Copy link
    Member Author

    warsaw commented Jul 3, 2018

    I think we'll clearly need a PEP for this clean up. I'd like to see a separate "preload" feature as well, especially one that is deterministic and happens before site.py. Not sure if that should be one PEP or two.

    @ericsnowcurrently
    Copy link
    Member

    @barry, make sure you take a look at https://bugs.python.org/issue14803.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jul 4, 2018

    To avoid confusing the discussions, two PEPs is likely a better option:

    1. Designing and implementing a dedicated preload mechanism
    2. Adjusting the way pth file handling works, including deprecating and removing the "pth arbitrary file execution" trick (depends on the first one as the forward compatible migration path for legitimate code preloading use cases)

    @terryjreedy
    Copy link
    Member

    This issue, as stated, looks like a severe regression to me.

    In each of my python installs, Lib/site-packages has a file called 'python.pth' containing 'F:/Python'. This is not a glob of inscrutable code. It is not even Python code. Just a path. Is this issue about something else also called a 'pth file'?

    F:/Python latter is a package development directory on my supplementary hard drive. When I first install a new version of Python (early alpha), I copy this tiny file. Voila! The packages within /Python are 'installed' for the new version without making copies. Editing a file edits it for all 'installs'. Deleting the directory for an old and no longer needed version does not delete any of my files.

    Import in files within F:/Python/pack act as if pack were installed in the site package for the version of python running the file. I can easily run anything in Command Prompt with 'py -x.y -m pack.file'. I can easily rerun with a different version by hitting up arrow and changing x.y. Command Prompt's current working directory does not matter.

    I think this is one of Python's most under-appreciated features. I am rather sure there is no way to so easily get the same effect. Abuse of a great feature is not a good reason to delete it completely.

    @native-api
    Copy link
    Mannequin

    native-api mannequin commented Jul 5, 2018

    They are very difficult to debug because they're processed too early.

    .pth's are processed by site.py, so no more difficult than site/sitecustomize.
    You can e.g. run `site.addpackage(<dir>,<file>,None)' to debug the logic.

    They usually contain globs of inscrutable code.

    An ability to contain code is there for a reason: to allow a module do something more intelligent than adding hardcoded paths if needed (e.g. pywin32 adds a subdir with .dll dependencies to PATH).

    A chunk of code is limited to a single line -- a conscious limitation to deter misuse 'cuz search path setup code is presumed to be small.

    If someone needs something other than path setup, they should do it upon the module's import instead.
    If they insist on misusing the feature, Python's design does what it's supposed to do in such cases: "make right things easy, make wrong things hard".

    If there's a valid reason to allow larger code chunks, we can introduce a different syntax -- e.g. something like bash here-documents.

    Exceptions in pth files can get swallowed in some cases.

    If this happens, it's a bug. A line from .pth is executed with "exec line", any exceptions should propagate up the stack as usual.

    They are loaded in indeterminate order.

    Present a use case justifying a specific order.
    I can see a probable use case: a package needs to do something using its dependencies, so any .pth for the dependencies should run before the one for the package.
    But I can't see why that package can't do this upon its import instead (saves unnecessary work if the user won't be using that package in that session, too).
    The only valid case I can see is if the package is using some 3rd-party import system (e.g. a .7z archive or some module repository) that needs to be loaded first for its search path to make sense.

    @warsaw
    Copy link
    Member Author

    warsaw commented Jul 9, 2018

    On Jul 5, 2018, at 14:23, Ivan Pozdeev <report@bugs.python.org> wrote:

    Ivan Pozdeev <ivan_pozdeev@mail.ru> added the comment:

    > They are very difficult to debug because they're processed too early.

    .pth's are processed by site.py, so no more difficult than site/sitecustomize.
    You can e.g. run `site.addpackage(<dir>,<file>,None)' to debug the logic.

    Not really. By the time you have access to a REPL to run that, site.py has already run, so you already have an unclean environment. Running with -S really isn’t feasible either since that’s often impossible (e.g. in a zip app like shiv or pex), or that leaves you with a broken environment so you can’t get to a usable REPL. What you often have to do is actually modify Python to put a breakpoint in site.py to see what’s actually happening. Yuck.

    > They usually contain globs of inscrutable code.

    An ability to contain code is there for a reason: to allow a module do something more intelligent than adding hardcoded paths if needed (e.g. pywin32 adds a subdir with .dll dependencies to PATH).

    A chunk of code is limited to a single line -- a conscious limitation to deter misuse 'cuz search path setup code is presumed to be small.

    Trust me, once you can execute arbitrary code in .pth files, you’re lost. And packages *do* execute arbitrary code that is very difficult to debug. And yes, those complex lines are both inscrutable and non-standard.

    If someone needs something other than path setup, they should do it upon the module's import instead.

    Except they often don’t.

    If they insist on misusing the feature, Python's design does what it's supposed to do in such cases: "make right things easy, make wrong things hard”.

    The problem comes when some random module you are including in your application does something weird in their .pth files that breaks assumptions *other* libraries or code is making. It’s not as uncommon as it might seem.

    If there's a valid reason to allow larger code chunks, we can introduce a different syntax -- e.g. something like bash here-documents.

    The size of the code chunks isn’t the only issue. Running arbitrary code in a .pth file has all kinds of negative consequences. It’s basically code that happens at import time, with all the problems that happen with that anti-pattern.

    > Exceptions in pth files can get swallowed in some cases.

    If this happens, it's a bug. A line from .pth is executed with "exec line", any exceptions should propagate up the stack as usual.

    > They are loaded in indeterminate order.

    Present a use case justifying a specific order.

    Interdependent namespace packages. If they get loaded in the wrong order, they can mess up __path__ settings, causing other namespace package portions to be un-importable. Yes, this does happen!

    @anntzer
    Copy link
    Mannequin

    anntzer mannequin commented Oct 25, 2018

    There are a number of packages that can "self-import" into any Python process depending on the presence of an environment variable, by installing a pth file that contains something like import os; __import__("thepkg") if os.environ.get("THEENVVAR") else None. Examples include colorization of logging output (https://coloredlogs.readthedocs.io/en/latest/api.html#environment-variables) or installation of a trace function (https://pypi.org/project/hunter/#environment-variable-activation).

    If the pth mechanism goes away, a preload system should definitely be present to provide a replacement; it should again support multiple packages each installing their own hook.

    @native-api
    Copy link
    Mannequin

    native-api mannequin commented Oct 26, 2018

    The primary motivation behind the suggestion seems to be the fact that the feature is abused.

    However, the documentation has no info whatsoever on what is the intended use -- thus what constitutes abuse. Without that, the accusations are kind of baseless -- how can we blame package authors for having to figure it out for themselves?

    I've made a PR with the corresponding note.
    Since the discussion has revealed a number of valid use cases for the feature for which there are no adequate alternatives currently, I hope it will diminish the discontent and be grounds to incite package authors to remove unnecessary logic from there.

    @native-api
    Copy link
    Mannequin

    native-api mannequin commented Nov 10, 2018

    @barry

    Interdependent namespace packages. If they get loaded in the wrong order, they can mess up __path__ settings

    Actually, when writing the PR, I had a revelation how this could be implemented. Via an import hook that would work like a union FS!

    In its .pth file, each such package will import the hook's module (which will cause the hook to be installed on the first import) and "register" its namespaces and/or dependencies with it. The hook will then calculate the required load order and enforce it upon import of any of the registered namespaces.

    @warsaw
    Copy link
    Member Author

    warsaw commented Mar 7, 2019

    On Mar 7, 2019, at 09:32, Anthony Sottile report@bugs.python.org wrote:

    I should have to start that package somehow.

    pip install is a pretty good opt-in already imo

    Except that it conflates responsibilities. I may not want to opt into coverage even being loaded in my application because I’m not going to use it and it has a negative impact on my application’s start up time. Yet because you’re on the same machine and you pip installed it, I have no choice but to pay those costs, which I haven’t explicitly opted in to.

    @asottile
    Copy link
    Mannequin

    asottile mannequin commented Mar 7, 2019

    I should have to start that package somehow.

    pip install is a pretty good opt-in already imo

    Except that it conflates responsibilities. I may not want to opt into coverage even being loaded in my application because I’m not going to use it and it has a negative impact on my application’s start up time. Yet because you’re on the same machine and you pip installed it, I have no choice but to pay those costs, which I haven’t explicitly opted in to.

    At least for the coverage plugins there is a required opt in from environment variable (as shown above). Though the startup cost is a good point. Perhaps I'm of the minority but I use virtualenvs for everything so I haven't even been considering the system python.

    @brettcannon
    Copy link
    Member

    RE: " So basically you'd remove the whole feature just cause one package no
    one installs abuses it. Doesn't make sense."

    No, the point being made is *at least* one package that was found on PyPI
    is abusing it, so it exists and we need to consider the possibility others
    are also abusing the feature.

    On Thu, Mar 7, 2019 at 10:22 AM Anthony Sottile <report@bugs.python.org>
    wrote:

    Anthony Sottile asottile@umich.edu added the comment:

    I should have to start that package somehow.

    pip install is a pretty good opt-in already imo

    Except that it conflates responsibilities. I may not want to opt into
    coverage even being loaded in my application because I’m not going to use
    it and it has a negative impact on my application’s start up time. Yet
    because you’re on the same machine and you pip installed it, I have no
    choice but to pay those costs, which I haven’t explicitly opted in to.

    At least for the coverage plugins there is a required opt in from
    environment variable (as shown above).

    For the ones you know about. Dealing with abuse of functionality isn't
    about what common practice is, but what a bad actor may do.

    Though the startup cost is a good point. Perhaps I'm of the minority but
    I use virtualenvs for everything so I haven't even been considering the
    system python.

    Trust me, from my perspective of the Python extension for VS Code you
    cannot ignore system installs.

    @ionelmc
    Copy link
    Mannequin

    ionelmc mannequin commented Mar 7, 2019

    because you’re on the same machine and you pip installed it, I have no
    choice but to pay those costs, which I haven’t explicitly opted in to.

    At least for the coverage plugins there is a required opt in from
    environment variable (as shown above).

    There's a simple if 'COVSOMETHING' in os.environ check to activate it.
    That can't cost a significant amount of time. This is rather another bad
    actor argument.

    @warsaw
    Copy link
    Member Author

    warsaw commented Mar 7, 2019

    On Mar 7, 2019, at 10:46, Ionel Cristian Mărieș report@bugs.python.org wrote:

    There's a simple if 'COVSOMETHING' in os.environ check to activate it.
    That can't cost a significant amount of time. This is rather another bad
    actor argument.

    You’re overlooking the significant cost of loading the module that does the check in the first place.

    @ionelmc
    Copy link
    Mannequin

    ionelmc mannequin commented Mar 7, 2019

    What module? That check should be done directly in the pth file.

    @zooba
    Copy link
    Member

    zooba commented Mar 7, 2019

    Nonetheless, it's still something that we could support better. If telling someone to set PYTHONSTARTUP is too hard, then we can design another way that can work well without relying on a barely documented (mis)feature.

    As one idea, we could add a way to register new -X options that would translate into an import/function call after doing site, which then means you could do "python -X coverage ..." and if you don't then you don't get code injected at all, regardless of bugs in any libraries you've installed.

    @ericvsmith
    Copy link
    Member

    I should have to start that package somehow.

    pip install is a pretty good opt-in already imo

    I think that’s where we disagree. Like others, I don’t want this to affect every python script in a given installation.

    > Instead of just shipping "my_module.foo", you ship "my_module.py" and "_my_module.foo", where "my_module.py" looks like:

    but that's exactly my point, now you have to ship extra junk python files when it's a way better experience to have the hooks _just work_

    You mean extra junk like .pth files? I don’t see the difference between a .py file and a .pth file, except I can’t opt out of .pth files.

    We’re just looking for some way to control the behavior, without giving the .pth file unlimited capabilities before the user script starts. If it’s “just” some extra .py files, then maybe that’s great. If we need some other new mechanism, then I’d be okay with that, too.

    @zooba
    Copy link
    Member

    zooba commented Mar 7, 2019

    You mean extra junk like .pth files? I don’t see the difference between a .py file and a .pth file, except I can’t opt out of .pth files.

    And you get multiple lines of code, and syntax highlighting, and linting, and all the other goodness in a genuine source file :)

    1 similar comment
    @zooba
    Copy link
    Member

    zooba commented Mar 7, 2019

    You mean extra junk like .pth files? I don’t see the difference between a .py file and a .pth file, except I can’t opt out of .pth files.

    And you get multiple lines of code, and syntax highlighting, and linting, and all the other goodness in a genuine source file :)

    @zooba
    Copy link
    Member

    zooba commented Mar 7, 2019

    You mean extra junk like .pth files? I don't see the difference between a .py file and a .pth file, except I can’t opt out of .pth files.

    And you get multiple lines of code, and syntax highlighting, and linting, and all the other goodness in a genuine source file :)

    @takluyver
    Copy link
    Mannequin

    takluyver mannequin commented Mar 7, 2019

    As a lurker on this issue: I think a lot of energy is being expended arguing about what is and isn't legitimate use cases, when there's actually more stuff that people agree about than not.

    I think this issue should be broken down into two, neither of which will actually result in removing pth files:

    1. Better ways to inspect and control the sys.path extension feature (as per Barry's message https://bugs.python.org/issue33944#msg337417 ).
    2. Designing a replacement for the arbitrary-code-at-startup feature (or even several replacements to meet different needs), leading to its eventual deprecation.

    If you like the ability for packages to install interpreter-startup hooks, then pth files are an ugly way to do it. If you don't, then you want better ways to control it. So let's see what we can come up with.

    At least several important use cases (coverage and debugging) would probably work with an environment variable to specify startup code. The coverage hooks already check an environment variable themselves, so it's clearly a control mechanism that works. It's also familiar from things like LD_PRELOAD that environment variables can affect code in powerful ways.

    But the PYTHONSTARTUP variable is not suitable for this, because it only affects interactive shell sessions. So maybe one useful step would be to specify a new environment variable, maybe PYTHONPRELOAD, and figure out how it will interact with all the other options.

    Then we can re-evaluate the use cases Anthony described (https://bugs.python.org/issue33944#msg337406 ) and debate the need for other startup-code mechanisms to go along with that.

    @ncoghlan
    Copy link
    Contributor

    Just noting that https://bugs.python.org/issue14803 is probably our most comprehensive discussion of the coverage use case for arbitrary pre-main code execution.

    Steve also made a comment above about potentially turning encodings into a namespace package: that's difficult due to the non-empty __init__.py file that registers a couple of codec search functions as a side effect of import: https://github.com/python/cpython/blob/master/Lib/encodings/__init__.py

    However, it would be possible to define a *new* namespace package for codec discovery that was searched after the standard search locations (so you could use it to add extra codecs, but not hijack existing ones).

    @brettcannon
    Copy link
    Member

    We could also have a new namespace package which is *just* for startup injection so it wasn't such a hack to tie into the codecs startup code.

    @qix
    Copy link
    Mannequin

    qix mannequin commented Aug 27, 2019

    -1

    This would make better_exceptions irreparably un-ergonomic.

    https://github.com/qix-/better-exceptions

    .PTH files are commonly used to install development middleware in order to enhance the development and debugging experience.

    I recognize the need for security, but could we instead focus on improving the security of the existing .PTH system instead of throwing out the baby with the bathwater?

    The search "pth files python virus|malicious" on Google returns this issue. Is .PTH a previously exploited vector? This is like saying NPM's install scripts are a vector. I'm not going to be running code that I don't at least trust a little.

    This issue reads like someone had a bad time with some poorly written Python code that was stuck inside a .PTH file, had to debug why it was causing a problem, and came here to cry about it (no offense, Barry).

    Instead of improving it, the first inclination was to remove it altogether without any regard to its use-cases or the effects it would have on some packages that rely on it.

    Let's improve it, not kill it.

    @miss-islington
    Copy link
    Contributor

    New changeset f9b5840 by Miss Islington (bot) (native-api) in branch 'master':
    bpo-33944: note about the intended use of code in .pth files (GH-10131)
    f9b5840

    @matrixise
    Copy link
    Member

    New changeset d1d968d by Stéphane Wirtel (Miss Islington (bot)) in branch '3.8':
    bpo-33944: note about the intended use of code in .pth files (GH-10131) (GH-15942)
    d1d968d

    @MichelDesmoulin
    Copy link
    Mannequin

    MichelDesmoulin mannequin commented Dec 27, 2019

    Can usercustomize.py be used as an alternative to "*.pth" execution magic ?

    @ionelmc
    Copy link
    Mannequin

    ionelmc mannequin commented Dec 27, 2019

    No. You can't put an usercustomize in a wheel.

    @MichelDesmoulin
    Copy link
    Mannequin

    MichelDesmoulin mannequin commented Dec 28, 2019

    Just realized it didn't work in venv anyway.

    @ncoghlan ncoghlan changed the title Deprecate and remove pth files Deprecate and remove code execution in pth files May 12, 2020
    @ncoghlan
    Copy link
    Contributor

    While it's still not entirely accurate, I've tweaked the title on the issue to refer to the arbitrary code execution behavior.

    Getting "Make pth file sys.path modifications easier to debug" in there as well would be rather tricky :)

    @asottile
    Copy link
    Mannequin

    asottile mannequin commented May 12, 2020

    fwiw virtualenv 20.x uses .pth now as well to fix some issues with venv-based environments

    @methane
    Copy link
    Member

    methane commented Jun 12, 2020

    New changeset 2145c8c by native-api in branch 'master':
    bpo-33944: site: Add site-packages tracing in verbose mode (GH-12110)
    2145c8c

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jan 1, 2021

    PEP-648 has been posted with a proposal to migrate sitecustomize.py, usersitecustomize.py and arbitrary code execution in pth files to a directory based __sitecustomize__ structure: https://www.python.org/dev/peps/pep-0648/

    @Metallicow
    Copy link

    It seems reasonable for all it's aspects of discussion. Tho all the other aspects seem lacking, as they are looking for actions, not words

    Words can be quite impolite at times, but they was put there for a reason. This has been a good discussion though.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests