-
-
Notifications
You must be signed in to change notification settings - Fork 29.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate and remove code execution in pth files #78125
Comments
pth files are evil. They are very difficult to debug because they're processed too early. They usually contain globs of inscrutable code. Exceptions in pth files can get swallowed in some cases. They are loaded in indeterminate order. They are also unnecessary to support namespace packages in Python 3 (ignoring straddling code). Let's start the process for removing them.
|
+1 |
Also +1. |
I'm generally in favour of getting rid of .pth files. But I did accept a PR adding support for them in Flit to act as a substitute for symlinks on Windows, to achieve something like a 'development install'. I'm not sure what the alternative is if they go away. |
Windows has symlinks now I believe, you just have to turn them on. And I would say there is no need for alternative. If a package needs to do something funky they can do it in their __init__.py file. Otherwise if I don't import a package it shouldn't get to do anything crazy through a .pth file. |
I don't want to use the execution features of .pth files, just their original functionality of adding extra directories to sys.path. I'd be very happy to see the arbitrary code execution 'feature' of .pth files go away. Windows supports symlinks, but the last I heard was that creating them requires some obscure permission bit. It seems to be awkward enough that Windows users aren't happy with the "just use symlinks" approach, which was what I was originally trying. |
My understanding about symlinks on Windows is that they require a permission ("Create symbolic links"), that normal users by default do not have. I'm not sure if this has changed recently. |
I am in favor of symlinks no longer being able to execute arbitrary code, however, I do think having them add to the path cannot be killed in two releases. Here is why:
So I think removing adding to the path will require much more thought and break a lot more code than removing arbitrary code execution. |
My only answer to Ethan is "don't use eggs". :) |
There are lots of problems with pth files, although arbitrary code execution is probably the most egregious. They are also notoriously difficult to debug, and happen before any control is given to user code. They certainly are unnecessary for namespace packages, which I think they currently get used for often in Py 2/3 straddling code. Maybe it will be okay to just fall back to sys.path extension, but I'd like to have a better understanding of exactly what the use cases are (in a pure Python 3 world), and we have to address the other problems about discovery and debuggability. |
Strong -1 without a functional replacement that provides comparable LD_PRELOAD capabilities (it also needs a full PEP that analyses all of the ways that setuptools and other packaging utilities use these files, such as for the implementation of "develop" mode, and the processing of ".lnk" mode). This change also needs to account for the Windows-only "._pth" files that override the path completely. The main discussion list for such a PEP should be distutils-sig, *not* python-ideas or import-sig (since distutils-sig is where we're more likely to find folks that are actually relying on the feature, and hence have a clearer idea of what will need to change to maintain a comparable level of ecosystem level capability). https://bugs.python.org/issue14803 is also related, as pth file processing should at least be delayed to run later than it does currently, and because "run code at startup" is one of the capabilities that would need replacing. |
Concrete use case for the original path extension capability: "pew add", which chains virtual environments together (allowing shared environments with a common default dependency set, and then additional per-application dependencies) |
Brett pointed out that may initial reaction above came across as quite blunt and demanding, so attempting to phrase that more clearly as a user experience consideration: It may be tempting to view this as purely a clean-up of the import system implementation, removing a quirky and error prone construct for the sake of improved maintainability of both the import system itself, and the maintainability of end user installations. My request (wearing my "BDFL-delegate for packaging interoperability standards" hat) is that proponents of the change resist the temptation to view the problem that way :) Path files are used extensively across the Python packaging ecosystem to implement additional environment management features beyond those provided natively by interpreter implementations, and while we've added native equivalents for some of them (namespace packages, virtual environments), we're far from having added support for all of them (dynamic package version selection, virtual environment chaining, editable package installs that still publish correct PEP-376 package metadata, etc). This means that any changes in this area pose significant backwards compatibility risks, and need to be approached carefully, and cautiously, with a strong emphasis on surveying real world code and seeing how the feature is currently being used. Or, alternatively, the idea can be broken up into smaller, lower impact changes that still help to address the import system and end user environment maintainability issues, but don't involve breaking backwards compatibility. (For an example of the latter: if "python -m site --list-pth-files" printed a list of all of the pth files and "python -m site --dump-pth-files" listed both the files and their contents, then environment debuggability would improve significantly without any compatibility impacts whatsoever) |
I would also add that editable installs should not break in the process. They are important. |
On Jun 23, 2018, at 18:56, Nick Coghlan <report@bugs.python.org> wrote:
Still, I firmly believe they’re a wart being abused for purposes they weren’t really intended for. It’s a trick of implementation that lines beginning with
+1 on working on *much* better debuggability and discoverability for .pth files first, and then consider their eventual deprecation, replacement, and/or removal. |
I *think* we need to ask maintainers of packages who use .pth -- at least, Mark Hammond (pywin32) -- to find out the impact and if everything can be done with other means. AFAICS it at least allows pywin32 to have many top-level modules without cluttering `site-packages'. pywin32 e.g. also copies some files to %windir%\system32 for some reason. And last time I checked, distutils had no functionality that involved symlinks, regardless of the OS. |
I think we also need to clearly separate two distinct aspects of .pth files:
It's point 2 that powers things like "pew add", and I don't see any particularly compelling reason to get rid of it. The "arbitrary code invocation for every single Python execution using that environment" aspect, on the other hand, is mostly a PITA, and used as a workaround for other features being missing (e.g. the PYTHONRUNFIRST proposal in https://bugs.python.org/issue14803). |
pywin32, up until recently, just listed 3 directories in its .pth file - these were for directories which pre-dated packages and were never converted. Eg, "import win32api" actually loads win32api.pyd from the "site-packages/win32" directory. Earlier this year, via mhammond/pywin32#1151, I also added the line: import os;os.environ["PATH"]+=(';'+os.path.join(sitedir,"pywin32_system32")) which is to support pywin32 being installed from wheels - this is due to pywin32 shipping with various shared DLLs which implement many pywin32 types - eg, pywintypesXX.dll is used by (almost) every single .pyd shipped with pywin32, and disutils doesn't offer any way of copying files as part of a post-install script or any other way of ensuring these .dll files are on the PATH or otherwise next to pythonXX.dll/.exe I'm happy to replace both of these with alternatives when they exist. |
I think we'll clearly need a PEP for this clean up. I'd like to see a separate "preload" feature as well, especially one that is deterministic and happens before site.py. Not sure if that should be one PEP or two. |
@barry, make sure you take a look at https://bugs.python.org/issue14803. |
To avoid confusing the discussions, two PEPs is likely a better option:
|
This issue, as stated, looks like a severe regression to me. In each of my python installs, Lib/site-packages has a file called 'python.pth' containing 'F:/Python'. This is not a glob of inscrutable code. It is not even Python code. Just a path. Is this issue about something else also called a 'pth file'? F:/Python latter is a package development directory on my supplementary hard drive. When I first install a new version of Python (early alpha), I copy this tiny file. Voila! The packages within /Python are 'installed' for the new version without making copies. Editing a file edits it for all 'installs'. Deleting the directory for an old and no longer needed version does not delete any of my files. Import in files within F:/Python/pack act as if pack were installed in the site package for the version of python running the file. I can easily run anything in Command Prompt with 'py -x.y -m pack.file'. I can easily rerun with a different version by hitting up arrow and changing x.y. Command Prompt's current working directory does not matter. I think this is one of Python's most under-appreciated features. I am rather sure there is no way to so easily get the same effect. Abuse of a great feature is not a good reason to delete it completely. |
.pth's are processed by site.py, so no more difficult than site/sitecustomize.
An ability to contain code is there for a reason: to allow a module do something more intelligent than adding hardcoded paths if needed (e.g. pywin32 adds a subdir with .dll dependencies to PATH). A chunk of code is limited to a single line -- a conscious limitation to deter misuse 'cuz search path setup code is presumed to be small. If someone needs something other than path setup, they should do it upon the module's import instead. If there's a valid reason to allow larger code chunks, we can introduce a different syntax -- e.g. something like bash here-documents.
If this happens, it's a bug. A line from .pth is executed with "exec line", any exceptions should propagate up the stack as usual.
Present a use case justifying a specific order. |
On Jul 5, 2018, at 14:23, Ivan Pozdeev <report@bugs.python.org> wrote:
Not really. By the time you have access to a REPL to run that, site.py has already run, so you already have an unclean environment. Running with -S really isn’t feasible either since that’s often impossible (e.g. in a zip app like shiv or pex), or that leaves you with a broken environment so you can’t get to a usable REPL. What you often have to do is actually modify Python to put a breakpoint in site.py to see what’s actually happening. Yuck.
Trust me, once you can execute arbitrary code in .pth files, you’re lost. And packages *do* execute arbitrary code that is very difficult to debug. And yes, those complex lines are both inscrutable and non-standard.
Except they often don’t.
The problem comes when some random module you are including in your application does something weird in their .pth files that breaks assumptions *other* libraries or code is making. It’s not as uncommon as it might seem.
The size of the code chunks isn’t the only issue. Running arbitrary code in a .pth file has all kinds of negative consequences. It’s basically code that happens at import time, with all the problems that happen with that anti-pattern.
Interdependent namespace packages. If they get loaded in the wrong order, they can mess up __path__ settings, causing other namespace package portions to be un-importable. Yes, this does happen! |
There are a number of packages that can "self-import" into any Python process depending on the presence of an environment variable, by installing a pth file that contains something like If the pth mechanism goes away, a preload system should definitely be present to provide a replacement; it should again support multiple packages each installing their own hook. |
The primary motivation behind the suggestion seems to be the fact that the feature is abused. However, the documentation has no info whatsoever on what is the intended use -- thus what constitutes abuse. Without that, the accusations are kind of baseless -- how can we blame package authors for having to figure it out for themselves? I've made a PR with the corresponding note. |
Actually, when writing the PR, I had a revelation how this could be implemented. Via an import hook that would work like a union FS! In its .pth file, each such package will import the hook's module (which will cause the hook to be installed on the first import) and "register" its namespaces and/or dependencies with it. The hook will then calculate the required load order and enforce it upon import of any of the registered namespaces. |
RE: " So basically you'd remove the whole feature just cause one package no No, the point being made is *at least* one package that was found on PyPI On Thu, Mar 7, 2019 at 10:22 AM Anthony Sottile <report@bugs.python.org>
For the ones you know about. Dealing with abuse of functionality isn't
Trust me, from my perspective of the Python extension for VS Code you |
There's a simple |
On Mar 7, 2019, at 10:46, Ionel Cristian Mărieș report@bugs.python.org wrote:
You’re overlooking the significant cost of loading the module that does the check in the first place. |
What module? That check should be done directly in the pth file. |
Nonetheless, it's still something that we could support better. If telling someone to set PYTHONSTARTUP is too hard, then we can design another way that can work well without relying on a barely documented (mis)feature. As one idea, we could add a way to register new -X options that would translate into an import/function call after doing site, which then means you could do "python -X coverage ..." and if you don't then you don't get code injected at all, regardless of bugs in any libraries you've installed. |
I think that’s where we disagree. Like others, I don’t want this to affect every python script in a given installation.
You mean extra junk like .pth files? I don’t see the difference between a .py file and a .pth file, except I can’t opt out of .pth files. We’re just looking for some way to control the behavior, without giving the .pth file unlimited capabilities before the user script starts. If it’s “just” some extra .py files, then maybe that’s great. If we need some other new mechanism, then I’d be okay with that, too. |
And you get multiple lines of code, and syntax highlighting, and linting, and all the other goodness in a genuine source file :) |
1 similar comment
And you get multiple lines of code, and syntax highlighting, and linting, and all the other goodness in a genuine source file :) |
And you get multiple lines of code, and syntax highlighting, and linting, and all the other goodness in a genuine source file :) |
As a lurker on this issue: I think a lot of energy is being expended arguing about what is and isn't legitimate use cases, when there's actually more stuff that people agree about than not. I think this issue should be broken down into two, neither of which will actually result in removing pth files:
If you like the ability for packages to install interpreter-startup hooks, then pth files are an ugly way to do it. If you don't, then you want better ways to control it. So let's see what we can come up with. At least several important use cases (coverage and debugging) would probably work with an environment variable to specify startup code. The coverage hooks already check an environment variable themselves, so it's clearly a control mechanism that works. It's also familiar from things like LD_PRELOAD that environment variables can affect code in powerful ways. But the PYTHONSTARTUP variable is not suitable for this, because it only affects interactive shell sessions. So maybe one useful step would be to specify a new environment variable, maybe PYTHONPRELOAD, and figure out how it will interact with all the other options. Then we can re-evaluate the use cases Anthony described (https://bugs.python.org/issue33944#msg337406 ) and debate the need for other startup-code mechanisms to go along with that. |
Just noting that https://bugs.python.org/issue14803 is probably our most comprehensive discussion of the coverage use case for arbitrary pre-main code execution. Steve also made a comment above about potentially turning encodings into a namespace package: that's difficult due to the non-empty However, it would be possible to define a *new* namespace package for codec discovery that was searched after the standard search locations (so you could use it to add extra codecs, but not hijack existing ones). |
We could also have a new namespace package which is *just* for startup injection so it wasn't such a hack to tie into the codecs startup code. |
-1 This would make https://github.com/qix-/better-exceptions .PTH files are commonly used to install development middleware in order to enhance the development and debugging experience. I recognize the need for security, but could we instead focus on improving the security of the existing .PTH system instead of throwing out the baby with the bathwater? The search "pth files python virus|malicious" on Google returns this issue. Is .PTH a previously exploited vector? This is like saying NPM's This issue reads like someone had a bad time with some poorly written Python code that was stuck inside a .PTH file, had to debug why it was causing a problem, and came here to cry about it (no offense, Barry). Instead of improving it, the first inclination was to remove it altogether without any regard to its use-cases or the effects it would have on some packages that rely on it. Let's improve it, not kill it. |
Can usercustomize.py be used as an alternative to "*.pth" execution magic ? |
No. You can't put an usercustomize in a wheel. |
Just realized it didn't work in venv anyway. |
While it's still not entirely accurate, I've tweaked the title on the issue to refer to the arbitrary code execution behavior. Getting "Make pth file sys.path modifications easier to debug" in there as well would be rather tricky :) |
fwiw virtualenv 20.x uses |
PEP-648 has been posted with a proposal to migrate sitecustomize.py, usersitecustomize.py and arbitrary code execution in pth files to a directory based |
It seems reasonable for all it's aspects of discussion. Tho all the other aspects seem lacking, as they are looking for actions, not words Words can be quite impolite at times, but they was put there for a reason. This has been a good discussion though. |
Just noting here that PEP 648 was rejected. A key part of the reason for rejection is that the proposal as written didn't generally solve the problems reported here, so it would have added complexity to startup without allowing us to deprecate including executable code in What I actually came here to post though was a recent case I encountered where the code execution support proved useful: chaining virtual environments together in a way that let binary extension modules still find the DLLs they needed. Specifically, whereas just adding
Note: using UTF-8 as the file encoding instead of ASCII only works on Windows in 3.12.4 and later versions due to the way .pth files are processed in older versions (the locale encoding is often UTF-8 on non-Windows systems, but never UTF-8 on Windows) Even if I do end up moving this particular use case over to |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: