classification
Title: Deprecate and remove pth files
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: 14803 Superseder:
Assigned To: Nosy List: Anthony Sottile, Chris Billington, Ethan Smith, Ivan.Pozdeev, Peter L3, SilentGhost, __Vano, barry, brett.cannon, cheryl.sabella, christian.heimes, eric.smith, eric.snow, ionelmc, jaraco, matrixise, mhammond, miss-islington, ncoghlan, nedbat, pitrou, qix-, steve.dower, takluyver, terry.reedy, veky, yan12125
Priority: normal Keywords: patch

Created on 2018-06-22 17:22 by barry, last changed 2019-09-11 13:36 by matrixise.

Pull Requests
URL Status Linked Edit
PR 10131 merged Ivan.Pozdeev, 2018-10-26 15:33
PR 12107 open Ivan.Pozdeev, 2019-02-28 22:24
PR 12110 open Ivan.Pozdeev, 2019-03-01 00:32
PR 15942 merged miss-islington, 2019-09-11 13:21
Messages (113)
msg320246 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-06-22 17:22
pth files are evil.  They are very difficult to debug because they're processed too early.  They usually contain globs of inscrutable code.  Exceptions in pth files can get swallowed in some cases.  They are loaded in indeterminate order.

They are also unnecessary to support namespace packages in Python 3 (ignoring straddling code).

Let's start the process for removing them.

1. Deprecate pth files in Python 3.8 and turn them off with the -3 option.

2. Kill off pth file support once Python 2 is EOL'd.
msg320249 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018-06-22 18:05
+1
msg320253 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-06-22 18:29
Also +1.
msg320266 - (view) Author: Thomas Kluyver (takluyver) * Date: 2018-06-22 20:53
I'm generally in favour of getting rid of .pth files. But I did accept a PR adding support for them in Flit to act as a substitute for symlinks on Windows, to achieve something like a 'development install'. I'm not sure what the alternative is if they go away.
msg320277 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-06-22 21:40
Windows has symlinks now I believe, you just have to turn them on.

And I would say there is no need for alternative. If a package needs to do something funky they can do it in their __init__.py file. Otherwise if I don't import a package it shouldn't get to do anything crazy through a .pth file.
msg320279 - (view) Author: Thomas Kluyver (takluyver) * Date: 2018-06-22 21:46
I don't want to use the execution features of .pth files, just their original functionality of adding extra directories to sys.path. I'd be very happy to see the arbitrary code execution 'feature' of .pth files go away.

Windows supports symlinks, but the last I heard was that creating them requires some obscure permission bit. It seems to be awkward enough that Windows users aren't happy with the "just use symlinks" approach, which was what I was originally trying.
msg320283 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-06-22 21:57
My understanding about symlinks on Windows is that they require a permission ("Create symbolic links"), that normal users by default do not have. I'm not sure if this has changed recently.
msg320284 - (view) Author: Ethan Smith (Ethan Smith) * Date: 2018-06-22 22:19
I am in favor of symlinks no longer being able to execute arbitrary code, however, I do think having them add to the path cannot be killed in two releases. Here is why:

1. Windows support for symlinks is still not automatic. In the creators update of Windows 10 (released March 2017), CreateSymbolicLink added a dwflag SYMBOLIC_LINK_FLAG_ALLOW_UNPRIVILEGED_CREATE. This requires the user to be in developer mode to work. CPython currently doesn't use this flag. (I will open an issue to add that in a moment). I worry that giving people little time to update will be troublesome.

2. All editable installs everywhere (AFAIK) and setuptools eggs (still somewhat common) use easy-install.pth to list where they are. I think breaking editable installs is a bad idea, as there is no clear solution for this. Also setuptools has a fair amount of work to do before it can replace egg installs.

So I think removing adding to the path will require much more thought and break a lot more code than removing arbitrary code execution.
msg320286 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-06-22 22:23
My only answer to Ethan is "don't use eggs". :)
msg320287 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-06-22 22:25
There are lots of problems with pth files, although arbitrary code execution is probably the most egregious.  They are also notoriously difficult to debug, and happen before any control is given to user code.  They certainly are unnecessary for namespace packages, which I think they currently get used for often in Py 2/3 straddling code.

Maybe it will be okay to just fall back to sys.path extension, but I'd like to have a better understanding of exactly what the use cases are (in a pure Python 3 world), and we have to address the other problems about discovery and debuggability.
msg320292 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-06-23 00:20
Strong -1 without a functional replacement that provides comparable LD_PRELOAD capabilities (it also needs a full PEP that analyses all of the ways that setuptools and other packaging utilities use these files, such as for the implementation of "develop" mode, and the processing of ".lnk" mode).

This change also needs to account for the Windows-only "._pth" files that override the path completely.

The main discussion list for such a PEP should be distutils-sig, *not* python-ideas or import-sig (since distutils-sig is where we're more likely to find folks that are actually relying on the feature, and hence have a clearer idea of what will need to change to maintain a comparable level of ecosystem level capability).

https://bugs.python.org/issue14803 is also related, as pth file processing should at least be delayed to run later than it does currently, and because "run code at startup" is one of the capabilities that would need replacing.
msg320293 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-06-23 00:24
Concrete use case for the original path extension capability: "pew add", which chains virtual environments together (allowing shared environments with a common default dependency set, and then additional per-application dependencies)
msg320342 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-06-24 01:56
Brett pointed out that may initial reaction above came across as quite blunt and demanding, so attempting to phrase that more clearly as a user experience consideration:

It may be tempting to view this as purely a clean-up of the import system implementation, removing a quirky and error prone construct for the sake of improved maintainability of both the import system itself, and the maintainability of end user installations.

My request (wearing my "BDFL-delegate for packaging interoperability standards" hat) is that proponents of the change resist the temptation to view the problem that way :)

Path files are used extensively across the Python packaging ecosystem to implement additional environment management features beyond those provided natively by interpreter implementations, and while we've added native equivalents for some of them (namespace packages, virtual environments), we're far from having added support for all of them (dynamic package version selection, virtual environment chaining, editable package installs that still publish correct PEP 376 package metadata, etc).

This means that any changes in this area pose significant backwards compatibility risks, and need to be approached carefully, and cautiously, with a strong emphasis on surveying real world code and seeing how the feature is currently being used.

Or, alternatively, the idea can be broken up into smaller, lower impact changes that still help to address the import system and end user environment maintainability issues, but don't involve breaking backwards compatibility.

(For an example of the latter: if "python -m site --list-pth-files" printed a list of all of the pth files and "python -m site --dump-pth-files" listed both the files and their contents, then environment debuggability would improve significantly without any compatibility impacts whatsoever)
msg320386 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-06-24 20:40
I would also add that editable installs should not break in the process. They are important.
msg320393 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-06-25 01:28
On Jun 23, 2018, at 18:56, Nick Coghlan <report@bugs.python.org> wrote:
> 
> My request (wearing my "BDFL-delegate for packaging interoperability standards" hat) is that proponents of the change resist the temptation to view the problem that way :)
> 
> Path files are used extensively across the Python packaging ecosystem to implement additional environment management features beyond those provided natively by interpreter implementations, and while we've added native equivalents for some of them (namespace packages, virtual environments), we're far from having added support for all of them (dynamic package version selection, virtual environment chaining, editable package installs that still publish correct PEP 376 package metadata, etc).

Still, I firmly believe they’re a wart being abused for purposes they weren’t really intended for.  It’s a trick of implementation that lines beginning with `import` are exec’d.  That being said…

> Or, alternatively, the idea can be broken up into smaller, lower impact changes that still help to address the import system and end user environment maintainability issues, but don't involve breaking backwards compatibility.

+1 on working on *much* better debuggability and discoverability for .pth files first, and then consider their eventual deprecation, replacement, and/or removal.
msg320724 - (view) Author: Ivan Pozdeev (Ivan.Pozdeev) * Date: 2018-06-29 17:52
I *think* we need to ask maintainers of packages who use .pth -- at least, Mark Hammond (pywin32) -- to find out the impact and if everything can be done with other means.

AFAICS it at least allows pywin32 to have many top-level modules without cluttering `site-packages'.

pywin32 e.g. also copies some files to %windir%\system32 for some reason. And last time I checked, distutils had no functionality that involved symlinks, regardless of the OS.
msg320754 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-06-30 06:37
I think we also need to clearly separate two distinct aspects of .pth files:

1. "import <module>; <arbitrary code execution goes here>" lines <--- Kill it with fire
2. "<add this directory to sys.path>" lines <--- This is fine and good and perfectly sensible

It's point 2 that powers things like "pew add", and I don't see any particularly compelling reason to get rid of it.

The "arbitrary code invocation for every single Python execution using that environment" aspect, on the other hand, is mostly a PITA, and used as a workaround for other features being missing (e.g. the PYTHONRUNFIRST proposal in https://bugs.python.org/issue14803).
msg320850 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2018-07-02 01:33
pywin32, up until recently, just listed 3 directories in its .pth file - these were for directories which pre-dated packages and were never converted. Eg, "import win32api" actually loads win32api.pyd from the "site-packages/win32" directory.

Earlier this year, via https://github.com/mhammond/pywin32/issues/1151, I also added the line:

import os;os.environ["PATH"]+=(';'+os.path.join(sitedir,"pywin32_system32"))

which is to support pywin32 being installed from wheels - this is due to pywin32 shipping with various shared DLLs which implement many pywin32 types - eg, pywintypesXX.dll is used by (almost) every single .pyd shipped with pywin32, and disutils doesn't offer any way of copying files as part of a post-install script or any other way of ensuring these .dll files are on the PATH or otherwise next to pythonXX.dll/.exe

I'm happy to replace both of these with alternatives when they exist.
msg320997 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-07-03 18:58
I think we'll clearly need a PEP for this clean up.  I'd like to see a separate "preload" feature as well, especially one that is deterministic and happens before site.py.  Not sure if that should be one PEP or two.
msg321005 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2018-07-03 21:01
@barry, make sure you take a look at https://bugs.python.org/issue14803.
msg321026 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-07-04 09:59
To avoid confusing the discussions, two PEPs is likely a better option:

1. Designing and implementing a dedicated preload mechanism
2. Adjusting the way pth file handling works, including deprecating and removing the "pth arbitrary file execution" trick (depends on the first one as the forward compatible migration path for legitimate code preloading use cases)
msg321125 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-07-05 18:09
This issue, as stated, looks like a severe regression to me.

In each of my python installs, Lib/site-packages has a file called 'python.pth' containing 'F:/Python'.  This is not a glob of inscrutable code.  It is not even Python code.  Just a path.  Is this issue about something else also called a 'pth file'?

F:/Python latter is a package development directory on my supplementary hard drive.  When I first install a new version of Python (early alpha), I copy this tiny file.  Voila!  The packages within /Python are 'installed' for the new version without making copies.  Editing a file edits it for all 'installs'.  Deleting the directory for an old and no longer needed version does not delete any of my files.

Import in files within F:/Python/pack act as if pack were installed in the site package for the version of python running the file.  I can easily run anything in Command Prompt with 'py -x.y -m pack.file'.  I can easily rerun with a different version by hitting up arrow and changing x.y.  Command Prompt's current working directory does not matter.

I think this is one of Python's most under-appreciated features.  I am rather sure there is no way to so easily get the same effect.  Abuse of a great feature is not a good reason to delete it completely.
msg321134 - (view) Author: Ivan Pozdeev (Ivan.Pozdeev) * Date: 2018-07-05 21:23
> They are very difficult to debug because they're processed too early.  

.pth's are processed by site.py, so no more difficult than site/sitecustomize.
You can e.g. run `site.addpackage(<dir>,<file>,None)' to debug the logic.

> They usually contain globs of inscrutable code.

An ability to contain code is there for a reason: to allow a module do something more intelligent than adding hardcoded paths if needed (e.g. pywin32 adds a subdir with .dll dependencies to PATH).

A chunk of code is limited to a single line -- a conscious limitation to deter misuse 'cuz search path setup code is presumed to be small.

If someone needs something other than path setup, they should do it upon the module's import instead.
If they insist on misusing the feature, Python's design does what it's supposed to do in such cases: "make right things easy, make wrong things hard".

If there's a valid reason to allow larger code chunks, we can introduce a different syntax -- e.g. something like bash here-documents.

> Exceptions in pth files can get swallowed in some cases.

If this happens, it's a bug. A line from .pth is executed with "exec line", any exceptions should propagate up the stack as usual.

> They are loaded in indeterminate order.

Present a use case justifying a specific order.
I can see a probable use case: a package needs to do something using its dependencies, so any .pth for the dependencies should run before the one for the package.
But I can't see why that package can't do this upon its import instead (saves unnecessary work if the user won't be using that package in that session, too).
The only valid case I can see is if the package is using some 3rd-party import system (e.g. a .7z archive or some module repository) that needs to be loaded first for its search path to make sense.
msg321340 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-07-09 18:45
On Jul 5, 2018, at 14:23, Ivan Pozdeev <report@bugs.python.org> wrote:
> 
> Ivan Pozdeev <ivan_pozdeev@mail.ru> added the comment:
> 
>> They are very difficult to debug because they're processed too early.
> 
> .pth's are processed by site.py, so no more difficult than site/sitecustomize.
> You can e.g. run `site.addpackage(<dir>,<file>,None)' to debug the logic.

Not really.  By the time you have access to a REPL to run that, site.py has already run, so you already have an unclean environment.  Running with -S really isn’t feasible either since that’s often impossible (e.g. in a zip app like shiv or pex), or that leaves you with a broken environment so you can’t get to a usable REPL.  What you often have to do is actually modify Python to put a breakpoint in site.py to see what’s actually happening.  Yuck.

> 
>> They usually contain globs of inscrutable code.
> 
> An ability to contain code is there for a reason: to allow a module do something more intelligent than adding hardcoded paths if needed (e.g. pywin32 adds a subdir with .dll dependencies to PATH).
> 
> A chunk of code is limited to a single line -- a conscious limitation to deter misuse 'cuz search path setup code is presumed to be small.

Trust me, once you can execute arbitrary code in .pth files, you’re lost.  And packages *do* execute arbitrary code that is very difficult to debug.  And yes, those complex lines are both inscrutable and non-standard.

> If someone needs something other than path setup, they should do it upon the module's import instead.

Except they often don’t.

> If they insist on misusing the feature, Python's design does what it's supposed to do in such cases: "make right things easy, make wrong things hard”.

The problem comes when some random module you are including in your application does something weird in their .pth files that breaks assumptions *other* libraries or code is making.  It’s not as uncommon as it might seem.

> If there's a valid reason to allow larger code chunks, we can introduce a different syntax -- e.g. something like bash here-documents.

The size of the code chunks isn’t the only issue.  Running arbitrary code in a .pth file has all kinds of negative consequences.  It’s basically code that happens at import time, with all the problems that happen with that anti-pattern.

> 
>> Exceptions in pth files can get swallowed in some cases.
> 
> If this happens, it's a bug. A line from .pth is executed with "exec line", any exceptions should propagate up the stack as usual.
> 
>> They are loaded in indeterminate order.
> 
> Present a use case justifying a specific order.

Interdependent namespace packages.  If they get loaded in the wrong order, they can mess up __path__ settings, causing other namespace package portions to be un-importable.  Yes, this does happen!
msg328488 - (view) Author: Antony Lee (Antony.Lee) * Date: 2018-10-25 20:47
There are a number of packages that can "self-import" into any Python process depending on the presence of an environment variable, by installing a pth file that contains something like `import os; __import__("thepkg") if os.environ.get("THEENVVAR") else None`.  Examples include colorization of logging output (https://coloredlogs.readthedocs.io/en/latest/api.html#environment-variables) or installation of a trace function (https://pypi.org/project/hunter/#environment-variable-activation).

If the pth mechanism goes away, a preload system should definitely be present to provide a replacement; it should again support multiple packages each installing their own hook.
msg328564 - (view) Author: Ivan Pozdeev (Ivan.Pozdeev) * Date: 2018-10-26 15:48
The primary motivation behind the suggestion seems to be the fact that the feature is abused.

However, the documentation has no info whatsoever on what is the intended use -- thus what constitutes abuse. Without that, the accusations are kind of baseless -- how can we blame package authors for having to figure it out for themselves?

I've made a PR with the corresponding note.
Since the discussion has revealed a number of valid use cases for the feature for which there are no adequate alternatives currently, I hope it will diminish the discontent and be grounds to incite package authors to remove unnecessary logic from there.
msg329607 - (view) Author: Ivan Pozdeev (Ivan.Pozdeev) * Date: 2018-11-10 12:50
@barry 

> Interdependent namespace packages.  If they get loaded in the wrong order, they can mess up __path__ settings

Actually, when writing the PR, I had a revelation how this could be implemented. Via an import hook that would work like a union FS!

In its .pth file, each such package will import the hook's module (which will cause the hook to be installed on the first import) and "register" its namespaces and/or dependencies with it. The hook will then calculate the required load order and enforce it upon import of any of the registered namespaces.
msg329764 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-11-12 22:04
On Nov 10, 2018, at 04:50, Ivan Pozdeev <report@bugs.python.org> wrote:

> In its .pth file, each such package will import the hook's module (which will cause the hook to be installed on the first import) and "register" its namespaces and/or dependencies with it. The hook will then calculate the required load order and enforce it upon import of any of the registered namespaces.

I’m a little concerned about this approach because it means random third party modules can affect the global environment for your application, without knowing it.  Since the hook installation happens at import time, and just depending on a library that has such a .pth file will install it, the end application will not have control over its global state.  It’s not possible to know whether this is a serious problem, but in the past, global state changes are problematic when applications do not have control over it.
msg329802 - (view) Author: Ivan Pozdeev (__Vano) Date: 2018-11-13 03:30
> I’m a little concerned about this approach because it means random third party modules can affect the global environment for your application, without knowing it.  Since the hook installation happens at import time, and just depending on a library that has such a .pth file will install it, the end application will not have control over its global state.
But "affecting the global environment for your application" is exactly 
what is intended here. You want multiple packages to all load their code 
into the same namespaces (aka module objects), thus of course 
potentially affecting/overriding each other's functionality. That's what 
you get when you have plugins -- a badly-written/incompatible plugin can 
and will break your app.

It doesn't have to "just depend on a library that has such a .pth file", 
it's up to the import hook's implementation. I just gave as example the 
simplest solution that requires zero effort on the main package 
maintainer's part.

E.g. you can only allow adding a new submodule by default, or require 
the "parent" package to "allow" insertions into itself, or move 
registration into the parent's configuration file (so the user needs to 
enable the plugin manually), or provide some more granular code 
injection techniques like e.g. event handler lists that certain plugins' 
functions will be added into. All that matters here is that the hook is 
going to automagically assemble the resulting namespaces from parts upon 
import.

Finally, Python applications don't have full control over their global 
state anyway. Any module can monkey-patch or override any other module 
via a variety of means. So, this risk is not something new or unexpected.
msg330115 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2018-11-19 21:11
Regarding other uses of .pth files, the project [future-fstrings](https://github.com/asottile/future-fstrings) relies on .pth files to enable its at-startup behavior.

I'm also +1 to remove .pth files, but I also believe it's not viable today due to development installs of pkg_resource-style namespace packages.

I haven't read the full history of this issue, but plan to get caught up on it soon.
msg333235 - (view) Author: Chris Billington (Chris Billington) Date: 2019-01-08 16:55
I develop analysis software for physics research, in which the user analyses their data using Python that they write themselves (my application functions as a kind of scheduler for when the analysis scripts should run and with what input). This software has a concept of 'the user's modules', which the user can import from anywhere. When the application is installed, it installs a .pth file to add this 'userlib' folder to the Python path. This way the user can maintain importable modules that they re-use in their analysis without having to put them on PyPI or anything like that (which would be impractical since they are often being hacked on and don't have anything resembling a release cycle). It is important that these modules aren't just available from within the environment my application provides, as that is a bit too rigid - the user should be able to use the normal Python REPL or IPython or whatever to develop and test their code when the 'scheduler' is not in control of running it. 

I'm not sure what I would do instead if .pth files went away. Modifying PYTHONPATH is messy since it applies to all python versions, whereas .pth files are nicely specific only to the one Python installation. sitecustomize.py is messy because if it already exists I need to programmatically modify it to add or remove my changes (and contend with the fact that other packages may be doing the same), whereas a .pth file is nicely separate.

I didn't even know about the arbitrary code execution capabilities of .pth files and don't really care, but keeping the ability to add directories to the Python path would be nice, as the alternatives for doing this are unappealing (and for my application, putting the code the user is hacking on daily deep inside a Conda environment folder hierarchy is unappealing too).
msg333536 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-01-13 02:40
To make a potentially viable concrete proposal here, I think a reasonable first step would be to change the ".pth" file processing code in site.py to emit PendingDeprecationWarning for the 'if line.startswith(("import ", "import\t")):' branch.

In addition to helping to determine the scope of the compatibility break being discussed here, such a warning would also be usable as a debugging tool.

I'd also suggest updating "python -m site" to list any pth files that it finds, and categorise them as simple sys.path additions (which are generally fine), and arbitrary code (which can be problematic).
msg333567 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-01-13 20:02
> To make a potentially viable concrete proposal here, I think a reasonable first step would be to change the ".pth" file processing code in site.py to emit PendingDeprecationWarning for the 'if line.startswith(("import ", "import\t")):' branch.

PendingDeprecationWarning because you don’t think we can remove this functionality in 3.9?

> In addition to helping to determine the scope of the compatibility break being discussed here, such a warning would also be usable as a debugging tool.
> 
> I'd also suggest updating "python -m site" to list any pth files that it finds, and categorise them as simple sys.path additions (which are generally fine), and arbitrary code (which can be problematic).

Great idea, +1
msg333568 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-01-13 20:42
I'm suggesting PendingDeprecationWarning because we can't *actually* deprecate anything until we provide a more transparent alternative that offers comparable functionality, and I haven't seen a credible proposal for a replacement yet.

So using PDW would truthfully indicate "We don't like this feature, and want to get rid of it as causing more problems than it solves, but also acknowledge that it is currently handling legitimate use cases that need to be addressed before we can remove it".

https://coverage.readthedocs.io/en/coverage-4.4.2/subprocess.html is one example I'm aware of that describes a legitimate use case for being able to run arbitrary code at software startup.
msg333569 - (view) Author: Chris Billington (Chris Billington) Date: 2019-01-13 20:49
coverage.py's documentation mentions:

> The sitecustomize.py technique is cleaner, but may involve modifying an existing sitecustomize.py, since there can be only one. If there is no sitecustomize.py already, you can create it in any directory on the Python path.

> The .pth technique seems like a hack, but works, and is documented behavior. On the plus side, you can create the file with any name you like so you don’t have to coordinate with other .pth files. On the minus side, you have to create the file in a system-defined directory, so you may need privileges to write it.

This brings to mind the transition of many programs from using a single config file or startup script to using a directory of config/startup files parsed/executed in alphabetical order. Would a sitecustomize.d/ directory (with files within it executed in alphabetical order) as a replacement for executable code in .pth files be an improvement on the status quo?
msg333572 - (view) Author: Ivan Pozdeev (__Vano) Date: 2019-01-13 21:53
> This brings to mind the transition of many programs from using a single config file or startup script to using a directory of config/startup files parsed/executed in alphabetical order. Would a sitecustomize.d/ directory (with files within it executed in alphabetical order) as a replacement for executable code in .pth files be an improvement on the status quo?

No, because the required execution order is governed by package 
interdependencies rather than names. SysVInit went around this by 
hand-picking number prefixes to files in rcN.d/ but this proved 
unmaintainable in the long run.
msg333591 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-14 09:01
I really hate .pth files because the slow down Python startup time for *all* applications whereas .pth files are usually specific to a very few applications using one or two specific modules.

They can also modify the behavior of Python for all applications, with no way to opt-out.

I would prefer to have an opt-in option, disabled by default.

I'm in favor of deprecating the feature in Python 3.8 and remove it from Python 3.9.

Python 3 already support namespaces which covers the most common use case of .pth files, no?

Another use case is to run code if a specific command line option is used or if an environment variable is set. For example, my faulthandler backport uses a .pth file to enable faulthandler if PYTHONFAULTHANDLER environment variable is set. I dislike this .pth file (I didn't write it ;-)). I'm fine with dropping this feature as a whole.

We can add a pending deprecation warning in Python 3.7 right now.
msg333592 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-01-14 09:14
As I said: editable installs (`pip install -e`) are an important use case of .pth files.

I don't see how namespace packages have anything to do with this, sorry.
msg333613 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-01-14 12:17
Namespace packages in general didn't rely on pth files - only the setuptools/pkg_resources implementation of them did.

I'll also reiterate that I am *completely* opposed to deprecating the "append entries to sys.path" usage model, as there is absolutely nothing wrong with that (if distros are ending up with an overly cluttered system that's making the standard path too long, then review the individual packages creating the clutter, don't remove the interpreter feature).

That "append to sys.path" aspect of the feature is all that's needed to make editable installs and virtual environment chaining work.

That means the aspect I'm in agreement with deprecating is the "arbitrary code execution on startup" case, but even for that, I don't think we should deprecate it until we have a comparable replacement that's more self-evidently a way of allowing arbitrary code execution, and also more obviously has the potential to make every interpreter startup in that Python installation slower.

I'm not really concerned about execution order issues between interdependent sitecustomize hooks, as there's already no ordering guarantee with .pth files, and if folks do need more control over the interdependencies for some reason then they can just rely on the regular import system rather than something sitecustomize specific.

So I think Chris Billington's proposed replacement is actually a reasonable idea:

1. In site.addsitedir, check for a __sitecustomize__ subdirectory after checking for .pth files
2. If any Python files are found in that directory, execute them
3. If "python -x importtime" has been specified, report the execution time of each of those files (this would allow both easy identification of any hooks that are being executed, as well as which ones are taking up a lot of time)

There could then be a "-Z" option that offered a more limited form of "-S": it would allow site.py itself to run, but disable the processing of `sitecustomize.py` and `__sitecustomize__` entries.
msg333637 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-01-14 19:20
I like Nick's proposal. It has I believe the features that satisfy the use-cases of which I'm currently aware... with one edge case you may not have considered - support for multiple `__sitecustomize__` locations.

Consider, for example, the case where `__sitecustomize__` is in some system space unwritable by the user, but the package being installed is being installed in `--user` space.

Or consider the case where permissions aren't at play, but where you have a package installed in a different part of the PYTHONPATH. For example, [pip-run installs a sitecustomize module](https://github.com/jaraco/pip-run/blob/6203b1aa8cb52b5c181457054cf6ddaa40361437/pip_run/launch.py#L33-L44) in a temporary directory that it adds to sys.path. Ignoring for a moment the reason why it does this, I'd like to focus on the general need - that multiple paths on PYTHONPATH might expect `__sitecustomize__` support. You wouldn't want to have all of the `__sitecustomize__` hooks in one directory, because then they'll be decoupled from components that may or may not be in PYTHONPATH.

For these reasons, I think you'd want for `__sitecustomize__` to be supported to exist in multiple locations on PYTHONPATH and honor all of the files in all such directories, somewhat similar to how namespace packages are supported.
msg333638 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-01-14 19:55
On Jan 14, 2019, at 04:02, STINNER Victor <report@bugs.python.org> wrote:
> 
> I really hate .pth files because the slow down Python startup time for *all* applications whereas .pth files are usually specific to a very few applications using one or two specific modules.
> 
> They can also modify the behavior of Python for all applications, with no way to opt-out.
> 
> I would prefer to have an opt-in option, disabled by default.

I completely agree.  The other problem is that .pth-caused problems are very difficult to diagnose and debug.  Essentially you have to hack site.py to break into the loading machinery.  I have to believe that we can come up with a better mechanism that doesn’t suffer from these problems.

Do we have a single place to capture a list of .pth use cases?
msg333639 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-01-14 19:56
On Jan 14, 2019, at 04:14, Antoine Pitrou <report@bugs.python.org> wrote:
> 
> As I said: editable installs (`pip install -e`) are an important use case of .pth files.

Is that true outside of virtual environments?  I care less about .pth files inside venvs, since those are typically isolated to a single development environment, and don’t affect Python applications or libraries globally.
msg333640 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-01-14 20:02
On Jan 14, 2019, at 07:17, Nick Coghlan <report@bugs.python.org> wrote:
> 
> I'll also reiterate that I am *completely* opposed to deprecating the "append entries to sys.path" usage model, as there is absolutely nothing wrong with that (if distros are ending up with an overly cluttered system that's making the standard path too long, then review the individual packages creating the clutter, don't remove the interpreter feature).

Yes, there is as Victor and others points out.  They do magical things that are difficult to debug and diagnose, and have global effects on the entire Python operating environment.

I’d be less opposed to a mechanism that is isolated to just those Python applications that need them.  I’d like to know about use cases outside of Python applications that can’t be done any other way.

> That "append to sys.path" aspect of the feature is all that's needed to make editable installs and virtual environment chaining work.
> 
> That means the aspect I'm in agreement with deprecating is the "arbitrary code execution on startup" case, but even for that, I don't think we should deprecate it until we have a comparable replacement that's more self-evidently a way of allowing arbitrary code execution, and also more obviously has the potential to make every interpreter startup in that Python installation slower.

I think we’re all in agreement about deprecating arbitrary code execution, so maybe this issue can concentrate on that, while we figure out what, if anything to do about the path extension use case.

I don’t care about slow start up of the interactive interpreter, but I do strongly care about the start up times for Python applications in general.  That’s why an opt-in mechanism is important.

> 1. In site.addsitedir, check for a __sitecustomize__ subdirectory after checking for .pth files
> 2. If any Python files are found in that directory, execute them
> 3. If "python -x importtime" has been specified, report the execution time of each of those files (this would allow both easy identification of any hooks that are being executed, as well as which ones are taking up a lot of time)
> 
> There could then be a "-Z" option that offered a more limited form of "-S": it would allow site.py itself to run, but disable the processing of `sitecustomize.py` and `__sitecustomize__` entries.

Is that a global __sitecustomize__ directory you’re talking about, or something specific to a Python application (or library?).
msg333642 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-01-14 21:04
> Is that true outside of virtual environments?

Not in my experience.  But I'm not sure special-casing virtual environments will make the situation easier to understand ;-)
msg333644 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-14 22:30
I don't think that you will like it, but I feel that a PEP will be needed
here to list use cases and explain what replace .pth files for each use
case. Maybe no replacement for some use cases is fine. The PEP doesn't have
to be long.

I also expect that it's going to be a large backward incompatible change. A
PEP can summerize the rationale, schedule deprecation, etc.

Any volunteer around? Barry, Nick, someone else?
msg333645 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-01-14 22:42
On Jan 14, 2019, at 17:30, STINNER Victor <report@bugs.python.org> wrote:
> 
> I don't think that you will like it, but I feel that a PEP will be needed
> here to list use cases and explain what replace .pth files for each use
> case. Maybe no replacement for some use cases is fine. The PEP doesn't have
> to be long.
> 
> I also expect that it's going to be a large backward incompatible change. A
> PEP can summerize the rationale, schedule deprecation, etc.

+1

> Any volunteer around? Barry, Nick, someone else?

I will volunteer to co-author.  I would definitely like at least Nick and/or Jason to help.
msg333698 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-01-15 12:53
`site.addsitedir` is called for every site-packages directory (whether global, within a venv, or at the user level), so my proposal above covers appending multiple segments.

Linux distros approach to handling this is terrible because they dump all their system packages into a single global site-packages, leading to the every growing sys.path problem that Barry is concerned about.

However, that's entirely the fault of distro packaging policies, and can be remedied in a far superior way by switching distros to a model where they create a venv per application, and then use .pth files to link in the system packages that they actually want visible to that application.

"Some users don't want to use virtual environments appropriately" is an incredibly poor reason for breaking a perfectly valid feature.
msg333699 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-01-15 12:55
Note that any PEP I contributed to writing would need to be restricted to eliminating arbitrary code execution, as I don't think there's anything wrong with the path extension feature.
msg333705 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-01-15 14:01
> `site.addsitedir` is called for every site-packages directory (whether global, within a venv, or at the user level), so my proposal above covers appending multiple segments.

Good point. I think you're assuming that only site dirs are appropriate for packages that require arbitrary code execution. I think I'd like to break that assumption and allow any location where packages can be installed (PYTHONPATH) to install hooks. Consider this use-case:

draft $ mkdir pkgs                                                                                                                           draft $ python3.5 -m pip download -d pkgs future_fstrings                                                                              Collecting future_fstrings
  Using cached https://files.pythonhosted.org/packages/36/25/070c2dc1fe1e51901df5875c495d6efbbf945a93a2ca40f47e5225302fb8/future_fstrings-0.4.5-py2.py3-none-any.whl
  Saved ./pkgs/future_fstrings-0.4.5-py2.py3-none-any.whl
Collecting tokenize-rt; python_version < "3.6" (from future_fstrings)
  Using cached https://files.pythonhosted.org/packages/76/82/0e6a9dda45dd76be22d74211443e199a330ac7e428b8dbbc5d116651be03/tokenize_rt-2.1.0-py2.py3-none-any.whl
  Saved ./pkgs/tokenize_rt-2.1.0-py2.py3-none-any.whl
Successfully downloaded future-fstrings tokenize-rt
draft $ cat > hello-fstrings.py                                                                                                             # coding: future_fstrings
print(f'hello world')                                             
draft $ PYTHONPATH=pkgs/future_fstrings-0.4.5-py2.py3-none-any.whl:pkgs/tokenize_rt-2.1.0-py2.py3-none-any.whl python3.5 hello-fstrings.py                                               
xonsh: subprocess mode: command not found: PYTHONPATH=pkgs/future_fstrings-0.4.5-py2.py3-none-any.whl:pkgs/tokenize_rt-2.1.0-py2.py3-none-any.whl
draft $ env PYTHONPATH=pkgs/future_fstrings-0.4.5-py2.py3-none-any.whl:pkgs/tokenize_rt-2.1.0-py2.py3-none-any.whl python3.5 hello-fstrings.py                                           
  File "hello-fstrings.py", line 1
SyntaxError: encoding problem: future_fstrings


If future-fstrings were properly installed, its runtime hook is called and the script can run:

draft $ python3.5 -m pip-run -q future-fstrings -- hello-fstrings.py                                                                                                                     
hello world


I'd like for a package like future-fstrings to be able to supply a hook that can be executed on startup that can be honored even if the package isn't installed in one of the site paths.

> Let's make a PEP.

I'd be delighted to help with the PEP.
msg333706 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-15 14:04
> SyntaxError: encoding problem: future_fstrings

IMHO that's the expected behavior. I would prefer to have to explicitly install this special encoding *before* loading a script using it.
msg333716 - (view) Author: Chris Billington (Chris Billington) Date: 2019-01-15 17:35
> Linux distros approach to handling this is terrible because they dump all their system packages into a single global site-packages, leading to the every growing sys.path problem that Barry is concerned about.

> However, that's entirely the fault of distro packaging policies, and can be remedied in a far superior way by switching distros to a model where they create a venv per application, and then use .pth files to link in the system packages that they actually want visible to that application.

I'm curious about this since it doesn't make sense to me. Dumping all packages at the top level in /usr/lib/pythonX.Y/site-packages means exactly zero .pth files. Wouldn't putting each module in its own directory, with all the directories necessary for a given app added to the path of a venv for that app mean strictly more .pth files, and a sys.path as long as the list of dependencies for that app? Whilst this would certainly be more flexible for keeping multiple versions of packages around as required by different apps, I don't see that it would decrease startup time at all - more folders need to be searched for each import, not less, and a recursive hierarchy of .pth files would need to be parsed at startup as each package pulled in the directories of its own dependencies. A flat structure like most linux distros use would seem like it would be as efficient as you could get, unless you think that searching through a larger list of strings for the right one is slower than opening a tree of .pth files.
msg333997 - (view) Author: Vedran Čačić (veky) * Date: 2019-01-18 17:40
I have a directory inside my home directory, and inside it I have files with various utilities I have written over the years. So far, whenever I have installed a new version of Python, I have simply put a util.pth into site-packages. If you remove that possibility, what am I supposed to do? Every other solution is either much more complicated, or doesn't enable me to evolve my utilities inplace, or both. What am I missing? (My OS is Windows, and shortcuts don't work, I've tried.)
msg334199 - (view) Author: Ionel Cristian Mărieș (ionelmc) Date: 2019-01-22 04:42
FYI I have 3 projects that use pth files to activate various features (an env var is usually the trigger):

https://pypi.org/project/pytest-cov - enables coverage measurement in any subprocess
https://pypi.org/project/manhole - installs a debug service
https://pypi.org/project/hunter - installs a tracer

I wouldn't like them being rendered almost or completely useless by such a hasty change. 

Running stuff during startup can be problematic and tricky, for example I have painfully found out that on python 2.7 you can completely hose up your codecs registry if you try to decode things during startup (before the registry is fully built) but I think it's a fair price for such a powerful feature.
msg335774 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2019-02-17 13:32
Hello all,

There was a lot of traction on this discussion a month ago and I was wondering if any updates/expectations should be set?  Specifically:

1.  There is a PR for a doc change that Terry approved, but wanted another core dev to look at.  If there is agreement on the doc change, then perhaps it can be merged for 3.8?  If not, then perhaps it can be closed?

2.  There was discussion about creating a PEP and I believe Barry, Jason, and possibly Nick said they wanted to work on it.  Has more work been done on that?  I'm not trying to push anyone, but I saw on other threads about the virtual whiteboard group being created to get some traction on ideas before PyCon, so I just wanted to put this back on the radar in case you wanted it to generate discussion at the language summit.

3.  I realize that PEPs are needed for any change and even to define what that change might look like, but is there any value in adding PendingDeprecationWarnings for 3.8 if that's a possible action that will happen?  As I understand it, it would be easier to remove that warning later instead of delaying any actions from it.

Thanks!
msg335926 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-02-19 11:30
> 3.  I realize that PEPs are needed for any change and even to define what that change might look like, but is there any value in adding PendingDeprecationWarnings for 3.8 if that's a possible action that will happen?  As I understand it, it would be easier to remove that warning later instead of delaying any actions from it.

We cannot modify Python before a PEP is approved. It's too early to see that a PEP removing support for .pth file will be approved or not. There are too many constraints and use cases.
msg336351 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-02-23 00:51
I took a look at the docs PR, and honestly I don't even get what the "intended" uses of executable code are supposed to be.

The examples are "load 3rd-party import hooks, adjust PATH variable", but the only cases I can think of where you'd need to do these in a .pth file is where your module is a single file. As soon as you have a package with __init__.py, you have a file that can do exactly the same modifications before the module that needs it is imported.

I'd be inclined to limit the doc change to not provide any "valid" uses for this, and just discourage doing anything that takes a long time (most of the text in the PR is fine, IMHO).

And yeah, I'd like to see the arbitrary code execution "feature" removed too.

As for .pth files in general, I'm interested in the scenarios that caused Barry to have to do difficult debugging where "python -m site" wasn't able to help. If they all involved arbitrary code execution, then let's take out the right tumor. But if they somehow manipulated sys.path in a way that looking at sys.path doesn't reveal, then I'd like to know how.
msg336662 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-02-26 13:19
Yep, I completely understand (and agree with) the desire to eliminate the code injection exploit that was introduced decades ago by using exec() to run lines starting with "import " (i.e. "import sys; <arbitrary code goes here>").

I just don't want to lose the "add this location to sys.path" behaviour that exists for lines in pth files that *don't* start with "import ", since that has plenty of legitimate use cases, and the only downside of overusing it is an excessively long default sys.path (which has far more consistent and obvious symptoms than the arbitrary code execution case can lead to).
msg336705 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-02-26 18:09
On Feb 26, 2019, at 05:19, Nick Coghlan <report@bugs.python.org> wrote:
> 
> I just don't want to lose the "add this location to sys.path" behaviour that exists for lines in pth files that *don't* start with "import ", since that has plenty of legitimate use cases, and the only downside of overusing it is an excessively long default sys.path (which has far more consistent and obvious symptoms than the arbitrary code execution case can lead to).

It’s also very difficult to debug because pth loading usually happens before the user has a chance to intervene with a debugger.  This means mysterious things can happen, like different versions of a package getting imported than you expect.

Extending sys.path is a useful use case, but doing so in pth files is problematic.
msg336709 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-02-26 20:32
> Extending sys.path is a useful use case, but doing so in pth files is problematic.

There are 100 other ways to end up in this situation though. Why is *this* one so much worse?

Can you offer an issue you hit that was caused by a .pth file that *wasn't* debuggable by listing sys.path?
msg336710 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-02-26 20:37
On Feb 26, 2019, at 12:32, Steve Dower <report@bugs.python.org> wrote:
> 
> There are 100 other ways to end up in this situation though. Why is *this* one so much worse?

Because there’s no good place to stick a pdb/breakpoint to debug such issues other than site.py, and that usually requires sudo.

> Can you offer an issue you hit that was caused by a .pth file that *wasn't* debuggable by listing sys.path?

I don’t remember the details, but yes I have been caught in this trap.  The thing is, by the time user code gets called, the damage is already done, so debugging is quite difficult.

This will be alleviated at least partially by deprecating the executing of random code.  Maybe just allowing sys.path hacking will be enough to make it not so terrible, especially if e.g. (and I haven’t check to see whether this is the case today), `python -v` shows you exactly which .pth file is extending sys.path.

The issue is discoverability.  Since pth files happen before you get an interpreter prompt, it’s too difficult to debug unexpected, wrong, or broken behavior.  My opposition would lessen if there were clear ways to debug, and preferably also prevent, pth interpretation.
msg336711 - (view) Author: Ionel Cristian Mărieș (ionelmc) Date: 2019-02-26 20:52
> Because there’s no good place to stick a pdb/breakpoint to debug such issues other than site.py, and that usually requires sudo.

Something bad was installed with sudo but suddenly sudo is not acceptable for debugging? This seems crazy.

How exactly are pth files hard to debug? Are those files hard to edit? They sure are, but the problem ain't the point where they are run, it's the fact that a big lump of code is stuffed on a single line. Lets fix that instead!

I've written pth files with lots of stuff in them, and my experience is quite the opposite - they help with debugging. A lot. It's an incredibly useful python feature.

> I don’t remember the details, but yes I have been caught in this trap. 

Maybe if you remember the details we can discuss what are the debugging options, and what can be improved.
msg336714 - (view) Author: Ivan Pozdeev (__Vano) Date: 2019-02-26 21:23
On 26.02.2019 23:37, Barry A. Warsaw wrote:

> My opposition would lessen if there were clear ways to debug, and preferably also prevent, pth interpretation.

Easy. Insert a chunk into site.py that would call pdb.set_trace() if an envvar (e.g. `PYSITEDEBUG') or a command line switch is set.

Actually, why can't whoever has this problem add such a chunk themselves? Is this really such a frequent and ubiquitous problem
that this needs to be in the stock codebase? I suspect we're dealing with a vocal minority here.
msg336716 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-02-26 22:23
Barry is a steering council member now, so by definition he's 1/5th of the loudest possible minority ;)

I am totally okay with adding more diagnostics here. Frankly, if "-v" doesn't currently log info about .pth files (or other things that the site module does when it's active) then we should just do that.
msg336721 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-02-26 23:31
On Feb 26, 2019, at 12:52, Ionel Cristian Mărieș <report@bugs.python.org> wrote:
> 
> Something bad was installed with sudo but suddenly sudo is not acceptable for debugging? This seems crazy.

Your sudo may not be my sudo. :)  Let’s say I update my Ubuntu desktop and a new version of package with a pth breaks.  Maybe I didn’t even know I was doing that, via automated updates, or management portal, etc.  Now a poor user who depends on this has their code break.  How do *they* debug the problem?

FWIW, `sudo pip install` should just be banned IMHO :).

> How exactly are pth files hard to debug? Are those files hard to edit? They sure are, but the problem ain't the point where they are run, it's the fact that a big lump of code is stuffed on a single line. Lets fix that instead!

For sure.  But here’s the thing: you need to know *which* pth file is problematic.  Which means you have to debug the entire startup process where pth files are loaded.  That means you’re not really debugging pth files themselves (often), but site.py.  Debugging site.py for an installed Python is not trivial.  Hopefully you are at least not squeamish about editing a system file and breaking Python worse than the original bug. <wink>
msg336722 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-02-26 23:41
On Feb 26, 2019, at 13:23, Ivan Pozdeev <report@bugs.python.org> wrote:
> 
> Easy. Insert a chunk into site.py that would call pdb.set_trace() if an envvar (e.g. `PYSITEDEBUG') or a command line switch is set.
> 
> Actually, why can't whoever has this problem add such a chunk themselves? Is this really such a frequent and ubiquitous problem
> that this needs to be in the stock codebase? I suspect we're dealing with a vocal minority here.

Basically yes, I’ve done this.  But think of the poor user who doesn’t have that expertise or ability to hack on an installed Python’s site.py file.  When their application breaks because some faulty pth was installed behind their back, how do they debug their application when the breakage has already occurred before Python even gets to their code?  How do they answer questions like “where did that magical sys.path entry come from?” or “how did that module get in sys.modules already?”
msg336725 - (view) Author: Ionel Cristian Mărieș (ionelmc) Date: 2019-02-27 00:30
On Wed, Feb 27, 2019 at 1:31 AM Barry A. Warsaw <report@bugs.python.org>
wrote:
> Your sudo may not be my sudo. :)  Let’s say I update my Ubuntu desktop
and a new version of package with a pth breaks.
> Maybe I didn’t even know I was doing that, via automated updates, or
management portal, etc.
> Now a poor user who depends on this has their code break.  How do *they*
debug the problem?

Well that's easy:

* update my Ubuntu desktop -> stuff breaks -> rollback/downgrade
* automated updates -> stuff breaks -> stop using them, and learn lesson ;)
* management portal -> stuff breaks -> complain to sysadmin

Desktop users don't need to debug problems, devs/sysadmins do. They have
sudo.

> FWIW, `sudo pip install` should just be banned IMHO :).

Lets also ban ctypes and threads right? :)

> For sure.  But here’s the thing: you need to know *which* pth file is
problematic.  Which means you have to debug the entire startup process
where pth files are loaded.

How many pth files could one have? 2-3 ... 5 at most. Just `locate .pth`
and rename the biggest one till the problem goes away.
msg336726 - (view) Author: Ionel Cristian Mărieș (ionelmc) Date: 2019-02-27 00:34
On Wed, Feb 27, 2019 at 1:41 AM Barry A. Warsaw <report@bugs.python.org>
wrote:
> Basically yes, I’ve done this.  But think of the poor user who doesn’t
have that expertise or ability to hack on an installed Python’s site.py
file.  When their application breaks because some faulty pth was installed
behind their back, how do they debug their application when the breakage
has already occurred before Python even gets to their code?  How do they
answer questions like “where did that magical sys.path entry come from?” or
“how did that module get in sys.modules already?”

Aren't these sort of questions answered by using `strace python -v` or
similar? What information is missing more exactly?
msg336809 - (view) Author: Peter L (Peter L3) Date: 2019-02-28 07:10
+1 for python -v listing .pth files found and loaded.

For debugging, I just add a:
    import sys; print('Loading mypth.pth')
to the start of the pth file.
A plain print doesn't work(?).
breakpoint() doesn't work(?).
It would be nice to be able to get the filename (__file__ is site.py)
msg336853 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-02-28 17:27
> But think of the poor user who doesn’t have that expertise or ability to hack on an installed Python’s site.py file.

This is actually part of the thinking behind the reportabug tool I started (and why when you format it as raw text you get the listing of everything in any directory on sys.path - mostly because I haven't added a Markdown rendering of that). If the answer is to enhance that and tell users "run `reportabug mybrokenmodule` and send me the output", well, that's why I put it on GitHub :) https://github.com/zooba/reportabug

I see no reason to hold up adding pth logging to -v, so anyone interested please feel free to do a PR.

The only reason I see to hold up PE 10131 (docs update) is because it documents the rationale for using arbitrary code execution in a pth file. Since we clearly want to get rid of it, I don't think we should in any way rationalize it in the docs.

Once these are done, I think we'll have to reevaluate whether .pth files are actually a problem in their normal behavior, and whether the benefit outweighs the cost. But since we're all agreed that they aren't easy to debug and contain features we all want to get rid of, there's not much point using the current state to do the cost/benefit analysis. Let's fix the bits we can fix first and then see where we stand.
msg336856 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2019-02-28 17:40
> contain features we all want to get rid of

I don't think even this is unanimous.  Things like registering codecs, instrumenting coverage in subprocesses, etc. all seem like legitimate uses of the arbitrary code execution feature
msg336860 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-02-28 18:04
On Feb 28, 2019, at 09:40, Anthony Sottile <report@bugs.python.org> wrote:
> 
> I don't think even this is unanimous.  Things like registering codecs, instrumenting coverage in subprocesses, etc. all seem like legitimate uses of the arbitrary code execution feature

Except pth files are a terrible interface for that, given all the other problems, including weird wall-of-code inducing restrictions on what actually gets executed.

I’m in agreement with Steve Dower in principle here.  I would like to see a solution that deprecates and eventually removes arbitrary code execution in pth files, leaves sys.path extension alone (for now <wink>), and improves the discoverability and debuggability of magical pth files.

What I think Anthony is looking for are ways to register “start up functions” that get executed automatically when the Python interpreter starts up.  Perhaps somewhat analogous to atexit functions?  But if we’re going to officially support a feature like that, I think a PEP would be the right vehicle to suss out all the gory details, like, should these things be global across all invocations of the interpreter, how a user or application would disable that, how would bugs in start up functions get discovered, reported, and debugged, what if any execution order guarantees should be made, etc.
msg336863 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2019-02-28 18:25
> What I think Anthony is looking for are ways to register “start up functions” that get executed automatically when the Python interpreter starts up

yes, this is what I want to still exist :)


my hope is that there's a clear standards-track replacement *before* deprecating .pth (which currently satisfies my usecases for startup functions)
msg336875 - (view) Author: Ivan Pozdeev (Ivan.Pozdeev) * Date: 2019-02-28 22:27
On second thought, the inability to debug code that runs at startup, before user code ever gets control, is a fundamental issue (this problem arises for any software that has startup code), so such a facility in stock codebase has a merit.
msg336882 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-03-01 00:58
The sitecustomize.py file is totally available, and the only limitation there is packages can't inject themselves into it on installation. And if you want to trigger it on a package import then you totally can (though there's *another* discussion about that being a bad idea).

.pth files really only satisfy the "run at startup because I'm a dependency of something that my user wants and don't make them opt-in to my changed behaviour", which I don't like :)

If encodings need to be available without an explicit import, sure, we can add a point for those. Import hooks can always be injected by a package __init__.py before the importer will try and resolve the module, so nothing is needed there. But having a PEP with specific use cases to argue about is the way to create new mechanisms here. I don't agree we need a solution before declaring that the old way should be avoided and will eventually be removed, provided we don't add noisy warnings until there's an alternative.
msg336939 - (view) Author: Ivan Pozdeev (__Vano) Date: 2019-03-01 17:27
On 01.03.2019 3:58, Steve Dower wrote
> Import hooks can always be injected by a package __init__.py before the importer will try and resolve the module, so nothing is needed there.

I thought the flaw in this reasoning in 
https://bugs.python.org/issue33944#msg320277 was obvious and didn't want 
to bother people refuting it. Apparently not.

To do anything in __init__.py, that __init__.py itself needs to be 
already importable. This very well may not be the case -- in fact, 
import hooks were designed specifically for the scenarios where this is 
not the case.

Imagine e.g. loading modules from a cloud storage (why not?) -- so 
nothing on the system at all except the hook. Or, suggested earlier in 
this ticket, a union namespace where the code to import needs to be 
constructed on the fly.

> .pth files really only satisfy the "run at startup because I'm a dependency of something that my user wants and don't make them opt-in to my changed behaviour"

Startup code (custom or not) is not a dependency of anything. It rather customizes the environment in which the program specified by the user would run, _before_ any user code could be allowed to get control. It is not a part of the program to be run but rather of the environment that the user wants, and it needs to be implicit so the user can use the same commands and code (compare venv). This is a required feature because the stock Python startup logic cannot possibly provide all the customizations that a user may need (compare initrd).

.pth's are equivalent to sitecustomize but allow the user to manage the set of code chunks automatically using the packaging infrastructure (compare .d directories in UNIX). The fact that this feature is mixed up with and often supplements "real packages" that a program would explicitly use is actually incidental: a package with a .pth does not need to have any functionality intended for explicit use.

> which I don't like 

If you don't like something, there's always a specific reason -- though you may not understand it consciously. So the way to go is dig into it, find out what specific speck is putting you off -- only then can you be sure that you are concentrating on the right thing and won't throw the baby out with the bathwater. Try to change one trait in your mind's eye leaving all else intact -- will the feeling go away? If it will, you are on the right track; can the trait you chose be split further? You know you found it when you can't change any further part and change the feeling and you can say with confidence how exactly what it's doing misaligns with your moral compass.

We already identified a few real reasons: hard to see, hard to debug, encourages unreadable code, run in arbitrary order when the order matters (and IIRC I provided fixes for all). What else?
msg336944 - (view) Author: Ivan Pozdeev (__Vano) Date: 2019-03-01 17:55
On 01.03.2019 20:27, Ivan Pozdeev wrote:
> The fact that this feature is mixed up with and often supplements 
> "real packages" that a program would explicitly use is actually 
> incidental: a package with a .pth does not need to have any 
> functionality intended for explicit use.
>
Eureka! So, there are actually two kinds of packages: "functional 
packages" to be used explicitly and "environment packages" to customize 
the execution environment. The infrastructure just doesn't distinguish 
between them and allows a package to combine both types of functionality 
for convenience.

By this logic, pywin32's .pth is effectively a private import hook to 
allow for its nonstandard structure. It could be in a separate 
"environment package" that would be a dependency but that would 
complicate things for no real gain.

The caveat with "environment packages" is that there are no predefined 
dependencies between them and between them and "functional packages". 
Their required execution order rather depends on user's needs. E.g. the 
order of import hooks' registration would matter if more than one can 
serve a specific name, and the user may prefer any of the options; 
whether some import hook is required to import some installed packages 
depends on the way they are installed.

This is the same with any other plugin functionality, too. And I'm not 
aware of any general solution because a solution is very situational. 
The best we can do here that I see is to allow the user (or, you guessed 
it, yet another "environment package" for manageability) to specify load 
order dependencies between .pth's.
msg336961 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2019-03-01 22:26
I don't have time to look through the data today but I wrote a script to collect the usages of `.pth` from pypi.  I realized after I ran it that I skipped source distributions with `.zip` extension but otherwise it's pretty complete:

https://github.com/asottile/pth-file-investigation

There are ~132 packages using `.pth` features (not including setuptools namespace packages which I had to exclude since there were so many of them).  I was planning to classify these but didn't have time to do so.

Some "highlights" from scrolling through the list, two of them are mine (future-breakpoint, future-fstrings), at least one is guido's (pyxl3), ruamel's namespace-packaging appears to use .pth (ruamel.* (12 packages))
msg336970 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-03-01 23:25
On Mar 1, 2019, at 09:27, Ivan Pozdeev <report@bugs.python.org> wrote:

> Startup code (custom or not) is not a dependency of anything. It rather customizes the environment in which the program specified by the user would run, _before_ any user code could be allowed to get control. It is not a part of the program to be run but rather of the environment that the user wants, and it needs to be implicit so the user can use the same commands and code (compare venv). This is a required feature because the stock Python startup logic cannot possibly provide all the customizations that a user may need (compare initrd).
> 
> .pth's are equivalent to sitecustomize but allow the user to manage the set of code chunks automatically using the packaging infrastructure (compare .d directories in UNIX). The fact that this feature is mixed up with and often supplements "real packages" that a program would explicitly use is actually incidental: a package with a .pth does not need to have any functionality intended for explicit use.
> 
> We already identified a few real reasons: hard to see, hard to debug, encourages unreadable code, run in arbitrary order when the order matters (and IIRC I provided fixes for all). What else?

The fact that .pth files are global and affect the entire Python installation.  That’s not so bad in venvs where we have environmental isolation, but it’s really bad (IMHO) for the global Python interpreter.  Right now, there’s no control over the scope of such environmental customizations; it’s all or nothing.  Applications should be able to opt in or out of them, just like they can with individual packages (which must be imported in order to affect the interpreter state).  The trick then is how to define opt-in for applications *before* the interpreter gets to user code.
msg336983 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-03-02 03:58
Barry's response in https://bugs.python.org/issue33944#msg336970 is exactly what my response to that point was going to be.

Just because I want to use package spam and it wants to use package eggs doesn't mean that eggs gets to enable cloud imports (or anything else similarly magical) automatically. If I want that, it can provide it and tell me to call it in my code, or it can do it when needed. Neither of those options require arbitrary code execution in a .pth file.
msg336984 - (view) Author: Ivan Pozdeev (__Vano) Date: 2019-03-02 03:59
On 02.03.2019 2:25, Barry A. Warsaw wrote:
> The fact that .pth files are global and affect the entire Python installation. <...> Right now, there’s no control over the scope of such environmental customizations; it’s all or nothing.

That's the entire purpose of "customizing the environment in which the 
program specified by the user would run". A customization can very well 
be implemented to be application-specific but it doesn't have to. Python 
was never designed to isolate modules from each other (an "application" 
as you say it is just another module) -- on the contrary, the amount of 
power it gives the user over the code that they don't control is one of 
its key appeals. A Python installation acts as a unit where anything can 
affect anything else, and the order is maintained with 
https://en.wikipedia.org/wiki/Soft_security .

So, if you need a compartmentalized application, a regular Python 
installation is a wrong tool for the job.
Compartmentalization comes at the price of:

  * rampant code duplication ('cuz if you actively distrust external
    code, you have to bring all the code you need with you) and all its
    corollaries (no automatic security fixes and modernized semantics;
    no memory and disk space economy from shared library reuse)
      o so compartmentalization is absolutely impossible within a shared
        environment: anything that you use can be changed by the user
        (e.g. to satisfy the requirements of something else, too)
  * lack of interoperability (how many Android apps do you know that can
    use each other's functionality?).

Venv does a pretty good job of providing you with a private copy of any 
3rd-party modules you require but not the envvars, the interpreter and 
the standard library (and any OS facilities they depend on). If you 
require a harder barrier between your app and the rest of the system 
and/or wish to actively prevent users from altering your application, 
you'll have to use a private Python installation (e.g. in /opt), or hide 
it from everyone with the likes of Pyinstaller, or an OS-level 
container, or a VM... or just drop the pretense and go SaaS(S) (that'll 
teach those sneaky bastards to mess with my code!).

> Applications should be able to opt in or out of them, just like they can with individual packages (which must be imported in order to affect the interpreter state).
Right on the contrary. To decide what environment an application shall 
be run in is the user's prerogative. The application itself has 
absolutely no business meddling in this. All it can do is declare some 
requirements for the environment (either explicitly or implicitly by 
making assumptions) and refuse to work or malfunction if they are not 
met (and the user is still fully within their right to say: "screw you, 
I know what I am doing" -- and fool the app into thinking they are met 
and assume responsibility for any breakages).

With "individual packages", it's actually completely the same: the app 
can decide which ones it wants to use, but it's the user who decides 
which ones are available for use!
msg336992 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-03-02 06:01
On Mar 1, 2019, at 19:59, Ivan Pozdeev <report@bugs.python.org> wrote:
> 
> Ivan Pozdeev <ivan_pozdeev@mail.ru> added the comment:
> 
> On 02.03.2019 2:25, Barry A. Warsaw wrote:
>> The fact that .pth files are global and affect the entire Python installation. <...> Right now, there’s no control over the scope of such environmental customizations; it’s all or nothing.
> 
> That's the entire purpose of "customizing the environment in which the
> program specified by the user would run". A customization can very well
> be implemented to be application-specific but it doesn't have to. Python
> was never designed to isolate modules from each other (an "application"
> as you say it is just another module) -- on the contrary, the amount of
> power it gives the user over the code that they don't control is one of
> its key appeals. A Python installation acts as a unit where anything can
> affect anything else, and the order is maintained with
> https://en.wikipedia.org/wiki/Soft_security .

So I just come at it from a different angle (I think Steve and I are aligned).

Here’s a very real use case about the dangers.  I use my Linux package manager to install a bunch of applications (I don’t totally agree with the “an application is just another package”).  I don’t even know that they are Python applications, they’re just tools that do something I like.  Now I have an idea for some cool Python thing to hack on and I just install a few development libraries with my package manager.   Maybe those libraries come from a secondary repo that has a different level of scrutiny.  Or maybe I think, hey what’s the harm to just `sudo pip install` a few things (yes, people do this all the time ;).

Subtly, under the hood, one of those transient dependencies down the stack installs some .pth file that executes some arbitrary code and breaks some of those distro provided applications.  And lets say I don’t notice weird things happening for a week.  Now I think “whoa! how did that application break? I didn’t change it at all”.  Not only did I mysteriously break things I relied on, but unless I’m an expert Pythonista and I know how to debug site.py, I’ve got almost no hope of fixing the problem by myself (SO to the rescue?).  If I do manage to diagnose the problem, I’ll have to go and uninstall the bad package, and I *should* report things to my distro or upstream.  Of course, upstream may say that it’s critical functionality to their library so too bad for you.

I’m not even making that up. :)

Sure, maybe the very concept of a distro-wide Python application is a mistake, but it’s what we have now, and it’s not going away.

>> Applications should be able to opt in or out of them, just like they can with individual packages (which must be imported in order to affect the interpreter state).
> Right on the contrary. To decide what environment an application shall
> be run in is the user's prerogative. The application itself has
> absolutely no business meddling in this.

Again, I just look at this from a different perspective.  The user probably doesn’t even know all the environmental factors that affect the operation of their applications, and changes in that environment can happen without the user’s knowledge.  All they’re going to know is that application X which is critical to their work has just broken.  Sadly, the engineer looking into the bug they filed on it will not be able to reproduce the problem.  And now nobody is happy. :)

> With "individual packages", it's actually completely the same: the app
> can decide which ones it wants to use, but it's the user who decides
> which ones are available for use!

It’s actually not the same, and that’s the point.  An application won’t ever import a library that actively harms it.  But it has no such guards against changes to the environment — any environment, including magical Python code.
msg337064 - (view) Author: Ivan Pozdeev (__Vano) Date: 2019-03-04 02:50
On 02.03.2019 9:01, Barry A. Warsaw

In all the cases you've described, Python is no different from any other 
Linux software. E.g. I can install something into /etc/profile.d that 
would break the shell or set an envvar that would change the behavior of 
standard utilities.
This is by design: Linux is designed for maximum interoperability, so 
there's only one of each component in the system and everything uses it 
whenever it needs that kind of functionality. It does support multiple 
versions of the same software, but it's a compromise that significantly 
complicates maintenance (primarily how to disambiguate them when 
something requests just "component X"), so they strive to avoid it 
whenever possible.
Likewise, complete freedom for root to wreak havoc in the system is also 
by design: distro maintainers only test and support official packages; 
anything else you use is either your responsibility or an app supplier's 
if they provide official support (and are within their right to deny 
support if you tweak the environment beyond their support promise) -- 
same as for any other software as well.

This is not even specific to .pth files, either, so you won't really 
eliminate the problem by removing them. You can break any other part of 
Python in subtle ways just as well -- e.g. overwrite or override binary 
files with incompatible ones, causing segfaults in random places 
(https://stackoverflow.com/q/51816639/648265 ).

Now, Linux does have "lower tier environments" that don't automatically 
affect "higher tiers". 1) Software installed into /usr/local doesn't 
hijack system scripts thanks to absolute paths in their shebangs; 
software in /opt is not on PATH at all; 2) /etc/profile* and bashrc are 
only executed by login shells and interactive shells, not by scripts, 
limiting their effect to processes created within a user session; 3) 
anything within a user's profile or run as a regular user (including 
~/.bash*) doesn't affect system-wide settings and processes run as root.

Blindly replicating 2) won't do for Python, however. Unlike Bash which 
has all the functionality compiled in, Python has an external standard 
library and arbitrary additional packages. They both are essential for 
its operation as a system component that other software can use without 
additional manipulations, AND Python gives the user freedom on how to 
arrange them in the system. So there must be a way to provide any 
"additional manipulations" that may be needed that the built-in startup 
logic doesn't have. From administration POV, any such startup logic is a 
part of the core offer to the system: core files+libraries+connecting 
logic = Python system component, so it must be invoked whenever Python 
is invoked.
And we do already have ways to apply startup code only to a "lower-tier 
environment" if such a need arises: user-specific -- user site; 
interactive-specific -- PYTHONSTARTUP.  There's no such thing as a 
"login shell" for Python but there's Python run in a user session; 
/etc/profile* can set envvars that would apply only there.

So it seems to me that what you are asking for is "/etc/profile.d for 
Python". When designing such a feature, note, however, that the concept 
of login sessions is completely alien to Python. I believe a way to 
provide an additional site-packages directory will do (I can't readily 
see an already available way to do so in 
https://docs.python.org/3/using/cmdline.html ).
msg337351 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2019-03-07 01:41
I did my best to classify those on pypi that were using `.pth` files.  My initial search had quite a few false positives (and now that I look at it, completely missed `.zip`-based source distributions so there's likely some false negatives as well)

Here's the summary of the categorizations:

$ cut -d, -f2 < data.csv | sort | uniq -c
      2 backport
      4 coverage
      4 debugging
      2 demo
      9 encoding
      7 except-hook
     58 false-positive
      6 import-hook
     20 module-layout
     20 monkeypatch


I realized about halfway through that "monkeypatch" was probably too broad of a category but continued with that through all of them, the monkeypatch category contains a few classes of things: fixing third party libraries, disabling ssl (yikes!), adding some "features" to builtins / stdlib modules -- which unfortunately I didn't really classify properly.

There was a single .pth file that I deemed "malicious" since it completely breaks the `subprocess` module (`subprocess-run`) but other than that they all seemed ~mostly not the worst.

A lot of the `module-layout` ones could be solved with things provided directly by `setuptools`, or just be rearranging their distribution's files.

The raw data is available in csv: https://github.com/asottile/pth-file-investigation/blob/master/data.csv
msg337353 - (view) Author: Ionel Cristian Mărieș (ionelmc) Date: 2019-03-07 02:51
> There was a single .pth file that I deemed "malicious" since it
completely breaks the `subprocess` module (`subprocess-run`)

It only seems to set an attribute. What's wrong with that? Does the early
import of subprocess cause problems?
msg337354 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2019-03-07 03:04
> > There was a single .pth file that I deemed "malicious" since it
completely breaks the `subprocess` module (`subprocess-run`)
>
> It only seems to set an attribute. What's wrong with that? Does the early
import of subprocess cause problems?

It assigns `subprocess.run`, which is an api in python3.5+.  In those versions, `subprocess.check_*` is implemented in terms of `subprocess.run`.   The `subprocess.run` provided by that package has a different api than the stdlib one so any use of the subprocess module is broken just by having that package installed
msg337365 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-03-07 06:56
On Mar 6, 2019, at 19:04, Anthony Sottile <report@bugs.python.org> wrote:

> It assigns `subprocess.run`, which is an api in python3.5+.  In those versions, `subprocess.check_*` is implemented in terms of `subprocess.run`.   The `subprocess.run` provided by that package has a different api than the stdlib one so any use of the subprocess module is broken just by having that package installed

Doesn’t that kind of prove my point? :)
msg337368 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2019-03-07 07:07
> Doesn’t that kind of prove my point? :)

It's not any worse than gevent ~breaking~ monkeypatching almost the entire standard library.  And to be fair to the author, it was created well before (2013-06-21) python3.5's `run` api existed (2015-04-14)

It's also the only problematic package that I could find -- if anything it's an indication that this feature is used (almost entirely) for good and without issue.
msg337370 - (view) Author: Ionel Cristian Mărieș (ionelmc) Date: 2019-03-07 08:11
> Doesn’t that kind of prove my point? :)

So basically you'd remove the whole feature just cause one package no one
installs abuses it. Doesn't make sense.
msg337396 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-03-07 15:38
There are two features here, let's be clear about what we're removing.

* extending sys.path with static (perhaps relative) directories
* arbitrary code execution (following "import " statements)

Only Barry wants to remove the first one, and the rest of us will push back hard enough to keep him in check ;)

Basically everyone wants to remove the second one, but we can't do that until there is replacement functionality for its legitimate use cases.

Looking at Anthony's list (and making some assumptions about what the titles mean), I'd propose that only encodings require a way to register them from an installed package. And maybe this is as simple as making "encodings" a namespace package?

For the others:

* backport, demo - no idea what these look like
* coverage, debugging, demo, except-hook - application/user responsibility, not a package's
* monkey-patching - kill it with fire
* import-hook, module-layout - easy enough to work around

(For those who are confused about the last, using a package __init__.py is how to modify these *when your package is actually loaded* and not on startup.)
msg337399 - (view) Author: Ionel Cristian Mărieș (ionelmc) Date: 2019-03-07 16:08
> * coverage, debugging, demo, except-hook - application/user
responsibility, not a package's

Elaborate please, as it sounds like you're simply dismissing my usecase.
msg337406 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2019-03-07 16:57
I think nearly all of the use cases in the packages are valid (except module-layout) -- or at least if this feature were removed without having a startup-time site-packages code execution feature there would be no possible replacement.  I'll elaborate a little more on the titles I've chosen:

* backport: provide features that are not available to that python version, but were ratified peps in later versions.  These necessarily must happen at startup as to affect the application being used
* demo: these are fine to ignore, the two packages that were classified here were merely demoing how to use `.pth` files can be packaged with setuptools
* coverage: almost all of these were "automatically instrument coverage in subprocesses under test", basically the need to enable coverage tracing in subprocesses triggered by the application under test.  It is not possible to do this in any other way than an initialization hook in the interpreter (or monkeypatching the subprocess module, which I'd argue is significantly worse than what this is doing)
* debugging: these provide additional introspection tools to analyze an application, these also need to be interpreter level as you cannot customize code outside of your control but may need to debug such code.
* except-hook: these also seem necessary as well, from the few I looked into more detail they seemed to be setting hooks such that $foreign-application could be used within another framework -- looking very similar to ubuntu's `sitecustomize.py` which sends traces to apport on crash (bug reporting for python-based packages).  If you had ownership of this application sure you could add an except hook, but these seem t be for cases where you do not control the application
* monkeypatch: I don't think we should be so swift to banish this category, sure the name is scary but there were many legitimate cases here.  Many of these were to patch limitations in packages outside of control (dead, no longer accepting patches, not willing to support other platforms, etc.).  the patches necessarily happen at startup because there's no other place to influence the code of these third party tools.  Don't get me wrong, monkeypatching is usually bad, but I don't think there would be an alternative to how these tools function if this feature were removed.
* import-hook: I also don't see an easy way to work around these, most of these added alternate filetypes that python could import, but you need *something* to make importing work in the first place


> Basically everyone wants to remove the second one, but we can't do that until there is replacement functionality for its legitimate use cases.

Without a poll I don't think assuming a majority is fair ;)
msg337408 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-03-07 16:59
> Elaborate please, as it sounds like you're simply dismissing my usecase.

I'm suggesting that to enable this functionality at startup, the user/application should have to do something like executing code or setting PYTHONSTARTUP.

What I'm dismissing is that "pip install some-package" can define a global startup task for your interpreter. I shouldn't get debugging or code coverage enabled every time I run "python" just because I installed some package - I should have to start that package somehow.
msg337409 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-03-07 17:02
> I don't think there would be an alternative to how these tools function if this feature were removed.

Right now, maybe, which is why we haven't just removed it :)

The point of the discussion is to say "this functionality is irreplaceable so we need to design a replacement". If a package can't do monkeypatching when imported for some reason, we should explore what those reasons are and provide a supported way to achieve their goals (or document the existing ways).
msg337410 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-03-07 17:06
Here's a trivial workaround for the import hook problem:

Assume we have "my_module.foo", and the import hook enables importing foo files.

Instead of just shipping "my_module.foo", you ship "my_module.py" and "_my_module.foo", where "my_module.py" looks like:

    import my_import_hook
    my_import_hook.install()

    from _my_module import *

This really isn't hard to do. As a bonus, you don't even need a full import hook anymore - you can use any kind of loader you want. And it should be fully backwards compatible (assuming special tricks weren't part of your public API), so your users won't even notice the upgrade.
msg337414 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2019-03-07 17:32
> What I'm dismissing is that "pip install some-package" can define a global startup task for your interpreter. I shouldn't get debugging or code coverage enabled every time I run "python" just because I installed some package

At least for the coverage tools they all play nice and require an environment variable to be set for them to take.  For example, `coverage-enable-subprocess` requires `COVERAGE_PROCESS_START=...` in order to start: https://github.com/bukzor/coverage_enable_subprocess/blob/9a0f4df99f0d008eba305c673dfae4269c6c5642/setup.py#L14

> I should have to start that package somehow.

`pip install` is a pretty good opt-in already imo

> Instead of just shipping "my_module.foo", you ship "my_module.py" and "_my_module.foo", where "my_module.py" looks like:

but that's exactly my point, now you have to ship extra junk python files when it's a way better experience to have the hooks _just work_
msg337417 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-03-07 18:03
On Mar 7, 2019, at 07:38, Steve Dower <report@bugs.python.org> wrote:
> 
> Steve Dower <steve.dower@python.org> added the comment:
> 
> There are two features here, let's be clear about what we're removing.
> 
> * extending sys.path with static (perhaps relative) directories
> * arbitrary code execution (following "import " statements)
> 
> Only Barry wants to remove the first one, and the rest of us will push back hard enough to keep him in check ;)

Not true!  I’m okay with keeping the path extension feature, albeit with improvements:

* Loading of .pth files and path extension should be expressed in verbose (`python -v`) output
* It should be possible to much more easily debug .pth file loading (I believe there is a PR for this but
  I haven’t had time to look at it yet)
* It should be possible to prevent .pth file loading, likely via interpreter switch or environment
  variable, akin to -s/-S
msg337418 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-03-07 18:13
On Mar 7, 2019, at 09:32, Anthony Sottile <report@bugs.python.org> wrote:
> 
>> I should have to start that package somehow.
> 
> `pip install` is a pretty good opt-in already imo

Except that it conflates responsibilities.  I may not want to opt into coverage even being loaded in my application because I’m not going to use it and it has a negative impact on my application’s start up time.  Yet because you’re on the same machine and you pip installed it, I have no choice but to pay those costs, which I haven’t explicitly opted in to.
msg337421 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2019-03-07 18:22
>>> I should have to start that package somehow.
>> 
>> `pip install` is a pretty good opt-in already imo
>
> Except that it conflates responsibilities.  I may not want to opt into coverage even being loaded in my application because I’m not going to use it and it has a negative impact on my application’s start up time.  Yet because you’re on the same machine and you pip installed it, I have no choice but to pay those costs, which I haven’t explicitly opted in to.

At least for the coverage plugins there is a required opt in from environment variable (as shown above).  Though the startup cost is a good point.  Perhaps I'm of the minority but I use virtualenvs for everything so I haven't even been considering the system python.
msg337422 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-03-07 18:37
RE: " So basically you'd remove the whole feature just cause one package no
one installs abuses it. Doesn't make sense."

No, the point being made is *at least* one package that was found on PyPI
is abusing it, so it exists and we need to consider the possibility others
are also abusing the feature.

On Thu, Mar 7, 2019 at 10:22 AM Anthony Sottile <report@bugs.python.org>
wrote:

>
> Anthony Sottile <asottile@umich.edu> added the comment:
>
> >>> I should have to start that package somehow.
> >>
> >> `pip install` is a pretty good opt-in already imo
> >
> > Except that it conflates responsibilities.  I may not want to opt into
> coverage even being loaded in my application because I’m not going to use
> it and it has a negative impact on my application’s start up time.  Yet
> because you’re on the same machine and you pip installed it, I have no
> choice but to pay those costs, which I haven’t explicitly opted in to.
>
> At least for the coverage plugins there is a required opt in from
> environment variable (as shown above).

For the ones you know about. Dealing with abuse of functionality isn't
about what common practice is, but what a bad actor may do.

> Though the startup cost is a good point.  Perhaps I'm of the minority but
> I use virtualenvs for everything so I haven't even been considering the
> system python.
>

Trust me, from my perspective of the Python extension for VS Code you
cannot ignore system installs.
msg337424 - (view) Author: Ionel Cristian Mărieș (ionelmc) Date: 2019-03-07 18:46
> because you’re on the same machine and you pip installed it, I have no
> choice but to pay those costs, which I haven’t explicitly opted in to.
>
> At least for the coverage plugins there is a required opt in from
> environment variable (as shown above).

There's a simple `if 'COVSOMETHING' in os.environ` check to activate it.
That can't cost a significant amount of time. This is rather another bad
actor argument.
msg337426 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-03-07 18:50
On Mar 7, 2019, at 10:46, Ionel Cristian Mărieș <report@bugs.python.org> wrote:
> 
> There's a simple `if 'COVSOMETHING' in os.environ` check to activate it.
> That can't cost a significant amount of time. This is rather another bad
> actor argument.

You’re overlooking the significant cost of loading the module that does the check in the first place.
msg337427 - (view) Author: Ionel Cristian Mărieș (ionelmc) Date: 2019-03-07 18:52
What module? That check should be done directly in the pth file.
msg337430 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-03-07 19:39
Nonetheless, it's still something that we could support better. If telling someone to set PYTHONSTARTUP is too hard, then we can design another way that can work well without relying on a barely documented (mis)feature.

As one idea, we could add a way to register new -X options that would translate into an import/function call after doing site, which then means you could do "python -X coverage ..." and if you don't then you don't get code injected at all, regardless of bugs in any libraries you've installed.
msg337434 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-03-07 20:46
>> I should have to start that package somehow.
>
> `pip install` is a pretty good opt-in already imo

I think that’s where we disagree. Like others, I don’t want this to affect every python script in a given installation. 

>> Instead of just shipping "my_module.foo", you ship "my_module.py" and "_my_module.foo", where "my_module.py" looks like:
>
> but that's exactly my point, now you have to ship extra junk python files when it's a way better experience to have the hooks _just work_


You mean extra junk like .pth files? I don’t see the difference between a .py file and a .pth file, except I can’t opt out of .pth files. 

We’re just looking for some way to control the behavior, without giving the .pth file unlimited capabilities before the user script starts. If it’s “just” some extra .py files, then maybe that’s great. If we need some other new mechanism, then I’d be okay with that, too.
msg337437 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-03-07 21:03
> You mean extra junk like .pth files? I don’t see the difference between a .py file and a .pth file, except I can’t opt out of .pth files.

And you get multiple lines of code, and syntax highlighting, and linting, and all the other goodness in a genuine source file :)
msg337438 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-03-07 21:03
> You mean extra junk like .pth files? I don’t see the difference between a .py file and a .pth file, except I can’t opt out of .pth files.

And you get multiple lines of code, and syntax highlighting, and linting, and all the other goodness in a genuine source file :)
msg337439 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-03-07 21:04
> You mean extra junk like .pth files? I don't see the difference between a .py file and a .pth file, except I can’t opt out of .pth files.

And you get multiple lines of code, and syntax highlighting, and linting, and all the other goodness in a genuine source file :)
msg337446 - (view) Author: Thomas Kluyver (takluyver) * Date: 2019-03-07 23:02
As a lurker on this issue: I think a lot of energy is being expended arguing about what is and isn't legitimate use cases, when there's actually more stuff that people agree about than not.

I think this issue should be broken down into two, neither of which will actually result in removing pth files:

1. Better ways to inspect and control the sys.path extension feature (as per Barry's message https://bugs.python.org/issue33944#msg337417 ).
2. Designing a replacement for the arbitrary-code-at-startup feature (or even several replacements to meet different needs), leading to its eventual deprecation.

If you like the ability for packages to install interpreter-startup hooks, then pth files are an ugly way to do it. If you don't, then you want better ways to control it. So let's see what we can come up with.

At least several important use cases (coverage and debugging) would probably work with an environment variable to specify startup code. The coverage hooks already check an environment variable themselves, so it's clearly a control mechanism that works. It's also familiar from things like LD_PRELOAD that environment variables can affect code in powerful ways.

But the PYTHONSTARTUP variable is not suitable for this, because it only affects interactive shell sessions. So maybe one useful step would be to specify a new environment variable, maybe PYTHONPRELOAD, and figure out how it will interact with all the other options.

Then we can re-evaluate the use cases Anthony described (https://bugs.python.org/issue33944#msg337406 ) and debate the need for other startup-code mechanisms to go along with that.
msg337920 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-03-14 14:36
Just noting that https://bugs.python.org/issue14803 is probably our most comprehensive discussion of the coverage use case for arbitrary pre-__main__ code execution.

Steve also made a comment above about potentially turning encodings into a namespace package: that's difficult due to the non-empty `__init__.py` file that registers a couple of codec search functions as a side effect of import: https://github.com/python/cpython/blob/master/Lib/encodings/__init__.py

However, it would be possible to define a *new* namespace package for codec discovery that was searched after the standard search locations (so you could use it to add extra codecs, but not hijack existing ones).
msg337954 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-03-14 17:26
We could also have a new namespace package which is *just* for startup injection so it wasn't such a hack to tie into the codecs startup code.
msg350625 - (view) Author: qix- (qix-) Date: 2019-08-27 10:13
-1

This would make `better_exceptions` irreparably un-ergonomic.

https://github.com/qix-/better-exceptions

.PTH files are commonly used to install development middleware in order to enhance the development and debugging experience.

I recognize the need for security, but could we instead focus on improving the security of the existing .PTH system instead of throwing out the baby with the bathwater?

The search "pth files python virus|malicious" on Google returns this issue. Is .PTH a previously exploited vector? This is like saying NPM's `install` scripts are a vector. I'm not going to be running code that I don't at least trust a little.

This issue reads like someone had a bad time with some poorly written Python code that was stuck inside a .PTH file, had to debug why it was causing a problem, and came here to cry about it (no offense, Barry).

Instead of improving it, the first inclination was to remove it altogether without any regard to its use-cases or the effects it would have on some packages that rely on it.

Let's improve it, not kill it.
msg351861 - (view) Author: miss-islington (miss-islington) Date: 2019-09-11 13:21
New changeset f9b5840fb4497a9e2ba2c1f01ad0dafba04c8496 by Miss Islington (bot) (native-api) in branch 'master':
bpo-33944: note about the intended use of code in .pth files (GH-10131)
https://github.com/python/cpython/commit/f9b5840fb4497a9e2ba2c1f01ad0dafba04c8496
msg351872 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2019-09-11 13:36
New changeset d1d968d45df1a900b0600c4d296b180aa978336d by Stéphane Wirtel (Miss Islington (bot)) in branch '3.8':
bpo-33944: note about the intended use of code in .pth files (GH-10131) (GH-15942)
https://github.com/python/cpython/commit/d1d968d45df1a900b0600c4d296b180aa978336d
History
Date User Action Args
2019-09-11 13:36:53matrixisesetnosy: + matrixise
messages: + msg351872
2019-09-11 13:21:17miss-islingtonsetpull_requests: + pull_request15577
2019-09-11 13:21:07miss-islingtonsetnosy: + miss-islington
messages: + msg351861
2019-08-27 10:13:45qix-setnosy: + qix-
messages: + msg350625
2019-07-18 09:40:18yan12125setnosy: + yan12125
2019-07-12 21:14:25nedbatsetnosy: + nedbat
2019-03-14 17:26:27brett.cannonsetmessages: + msg337954
2019-03-14 14:36:07ncoghlansetdependencies: + Add feature to allow code execution prior to __main__ invocation
messages: + msg337920
2019-03-07 23:02:12takluyversetmessages: + msg337446
2019-03-07 21:04:14steve.dowersetmessages: + msg337439
2019-03-07 21:03:55steve.dowersetmessages: + msg337438
2019-03-07 21:03:38steve.dowersetmessages: + msg337437
2019-03-07 20:46:05eric.smithsetmessages: + msg337434
2019-03-07 19:39:53steve.dowersetmessages: + msg337430
2019-03-07 18:52:33ionelmcsetmessages: + msg337427
2019-03-07 18:50:51barrysetmessages: + msg337426
2019-03-07 18:46:16ionelmcsetmessages: + msg337424
2019-03-07 18:37:07brett.cannonsetmessages: + msg337422
2019-03-07 18:22:29Anthony Sottilesetmessages: + msg337421
2019-03-07 18:13:43barrysetmessages: + msg337418
2019-03-07 18:03:03barrysetmessages: + msg337417
2019-03-07 17:32:09Anthony Sottilesetmessages: + msg337414
2019-03-07 17:06:23steve.dowersetmessages: + msg337410
2019-03-07 17:02:28steve.dowersetmessages: + msg337409
2019-03-07 16:59:53steve.dowersetmessages: + msg337408
2019-03-07 16:57:15Anthony Sottilesetmessages: + msg337406
2019-03-07 16:08:45ionelmcsetmessages: + msg337399
2019-03-07 15:38:56steve.dowersetmessages: + msg337396
2019-03-07 08:11:14ionelmcsetmessages: + msg337370
2019-03-07 07:07:19Anthony Sottilesetmessages: + msg337368
2019-03-07 06:56:21barrysetmessages: + msg337365
2019-03-07 03:04:59Anthony Sottilesetmessages: + msg337354
2019-03-07 02:51:36ionelmcsetmessages: + msg337353
2019-03-07 01:41:51Anthony Sottilesetmessages: + msg337351
2019-03-04 02:50:21__Vanosetmessages: + msg337064
2019-03-02 06:01:36barrysetmessages: + msg336992
2019-03-02 03:59:43__Vanosetmessages: + msg336984
2019-03-02 03:58:43steve.dowersetmessages: + msg336983
2019-03-01 23:25:40barrysetmessages: + msg336970
2019-03-01 22:26:49Anthony Sottilesetmessages: + msg336961
2019-03-01 17:55:38__Vanosetmessages: + msg336944
2019-03-01 17:27:23__Vanosetmessages: + msg336939
2019-03-01 00:58:35steve.dowersetmessages: + msg336882
2019-03-01 00:32:40Ivan.Pozdeevsetpull_requests: + pull_request12117
2019-02-28 22:27:55Ivan.Pozdeevsetmessages: + msg336875
2019-02-28 22:24:18Ivan.Pozdeevsetpull_requests: + pull_request12114
2019-02-28 18:25:39Anthony Sottilesetmessages: + msg336863
2019-02-28 18:04:08barrysetmessages: + msg336860
2019-02-28 17:40:16Anthony Sottilesetmessages: + msg336856
2019-02-28 17:27:51steve.dowersetmessages: + msg336853
2019-02-28 07:10:58Peter L3setmessages: + msg336809
2019-02-28 06:58:40Peter L3setnosy: + Peter L3
2019-02-27 00:34:30ionelmcsetmessages: + msg336726
2019-02-27 00:30:18ionelmcsetmessages: + msg336725
2019-02-26 23:41:20barrysetmessages: + msg336722
2019-02-26 23:31:47barrysetmessages: + msg336721
2019-02-26 22:23:02steve.dowersetmessages: + msg336716
2019-02-26 21:23:18__Vanosetmessages: + msg336714
2019-02-26 20:52:41ionelmcsetmessages: + msg336711
2019-02-26 20:37:59barrysetmessages: + msg336710
2019-02-26 20:32:24steve.dowersetmessages: + msg336709
2019-02-26 18:09:08barrysetmessages: + msg336705
2019-02-26 13:19:51ncoghlansetmessages: + msg336662
2019-02-23 00:51:23steve.dowersetmessages: + msg336351
2019-02-22 13:47:48vstinnersetnosy: - vstinner
2019-02-21 22:12:50steve.dowersetnosy: + steve.dower
2019-02-19 11:30:14vstinnersetmessages: + msg335926
2019-02-17 13:32:12cheryl.sabellasetnosy: + cheryl.sabella
messages: + msg335774
2019-01-22 04:42:30ionelmcsetnosy: + ionelmc
messages: + msg334199
2019-01-18 17:40:00vekysetnosy: + veky
messages: + msg333997
2019-01-15 17:35:32Chris Billingtonsetmessages: + msg333716
2019-01-15 14:04:56vstinnersetmessages: + msg333706
2019-01-15 14:01:31jaracosetmessages: + msg333705
2019-01-15 12:55:32ncoghlansetmessages: + msg333699
2019-01-15 12:53:04ncoghlansetmessages: + msg333698
2019-01-14 22:42:18barrysetmessages: + msg333645
2019-01-14 22:30:17vstinnersetmessages: + msg333644
2019-01-14 21:04:47pitrousetmessages: + msg333642
2019-01-14 20:02:31barrysetmessages: + msg333640
2019-01-14 19:56:34barrysetmessages: + msg333639
2019-01-14 19:55:22barrysetmessages: + msg333638
2019-01-14 19:20:04jaracosetmessages: + msg333637
2019-01-14 12:17:17ncoghlansetmessages: + msg333613
2019-01-14 10:01:00SilentGhostsetnosy: + SilentGhost
2019-01-14 09:14:03pitrousetmessages: + msg333592
2019-01-14 09:01:58vstinnersetmessages: + msg333591
2019-01-13 22:04:46Antony.Leesetnosy: - Antony.Lee
2019-01-13 21:53:24__Vanosetmessages: + msg333572
2019-01-13 20:49:58Chris Billingtonsetmessages: + msg333569
2019-01-13 20:42:16ncoghlansetmessages: + msg333568
2019-01-13 20:02:26barrysetmessages: + msg333567
2019-01-13 02:40:21ncoghlansetmessages: + msg333536
2019-01-08 16:55:04Chris Billingtonsetnosy: + Chris Billington
messages: + msg333235
2018-11-29 14:58:35vstinnersetnosy: + vstinner
2018-11-19 21:11:35jaracosetnosy: + jaraco
messages: + msg330115
2018-11-13 03:30:58__Vanosetnosy: + __Vano
messages: + msg329802
2018-11-12 22:04:54barrysetmessages: + msg329764
2018-11-10 12:50:07Ivan.Pozdeevsetmessages: + msg329607
2018-10-27 21:44:20Anthony Sottilesetnosy: + Anthony Sottile
2018-10-26 15:48:53Ivan.Pozdeevsetmessages: + msg328564
2018-10-26 15:33:58Ivan.Pozdeevsetkeywords: + patch
stage: patch review
pull_requests: + pull_request9463
2018-10-25 20:47:07Antony.Leesetnosy: + Antony.Lee
messages: + msg328488
2018-07-09 18:45:29barrysetmessages: + msg321340
2018-07-05 21:23:06Ivan.Pozdeevsetmessages: + msg321134
2018-07-05 18:09:27terry.reedysetnosy: + terry.reedy
messages: + msg321125
2018-07-04 09:59:59ncoghlansetmessages: + msg321026
2018-07-03 21:01:04eric.snowsetmessages: + msg321005
2018-07-03 18:58:59barrysetmessages: + msg320997
2018-07-02 01:33:06mhammondsetmessages: + msg320850
2018-06-30 06:37:48ncoghlansetmessages: + msg320754
2018-06-29 17:52:38Ivan.Pozdeevsetnosy: + mhammond, Ivan.Pozdeev
type: enhancement
messages: + msg320724
2018-06-25 01:28:57barrysetmessages: + msg320393
2018-06-24 20:40:09pitrousetnosy: + pitrou
messages: + msg320386
2018-06-24 01:56:20ncoghlansetmessages: + msg320342
2018-06-23 00:24:23ncoghlansetmessages: + msg320293
2018-06-23 00:20:22ncoghlansetnosy: + ncoghlan
messages: + msg320292
2018-06-22 22:25:22barrysetmessages: + msg320287
2018-06-22 22:23:15brett.cannonsetmessages: + msg320286
2018-06-22 22:19:24Ethan Smithsetnosy: + Ethan Smith
messages: + msg320284
2018-06-22 21:57:09eric.smithsetmessages: + msg320283
2018-06-22 21:46:39takluyversetmessages: + msg320279
2018-06-22 21:40:41brett.cannonsetmessages: + msg320277
2018-06-22 20:53:38takluyversetnosy: + takluyver
messages: + msg320266
2018-06-22 18:29:20eric.smithsetnosy: + eric.smith
messages: + msg320253
2018-06-22 18:05:27christian.heimessetnosy: + christian.heimes
messages: + msg320249
2018-06-22 17:22:20barrycreate