classification
Title: Allow Python distributors to add custom site install schemes
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: FFY00, christian.heimes, frenzy, hroncok, jaraco, lemburg, petr.viktorin, steve.dower
Priority: normal Keywords:

Created on 2021-04-29 16:19 by FFY00, last changed 2021-05-05 21:30 by lemburg.

Pull Requests
URL Status Linked Edit
PR 25718 open FFY00, 2021-04-29 16:19
Messages (23)
msg392326 - (view) Author: Filipe Laíns (FFY00) * Date: 2021-04-29 16:19
As part of the distutils migration we plan to add a mechanism to let Python distributors to add site install schemes.

Currently, Python distributors are patching distutils to add custom install schemes for their packages. I think most of the reasoning boils down to them wanting to stop Python installers, such as pip, to modify/interfere with their packages.

With the distutils deprecation, and it becoming a 3rd party module, Python distributors can no longer patch it. Because of this, we made distutils use the sysconfig module instead, which fixes the issue at the moment -- Python distributors can now patch sysconfig itself -- but is not a long term solution.
To prevent Python distributors from having to patch implementation details, and have things break unexpectedly, we aim to introduce a system that distributors can use for this purpose.

The idea is that they have a config file, which they can pass to configure, and in that config file they can specify some extra install schemes. These install schemes will get added in sysconfig, and will be loaded in the site module initialization.

In practice, it will look something like this:

config.py
```
EXTRA_SITE_INSTALL_SCHEMES = {
    'posix_prefix': {
        'stdlib': '{installed_base}/{platlibdir}/python{py_version_short}',
        'platstdlib': '{platbase}/{platlibdir}/python{py_version_short}',
        'purelib': '{base}/lib/python{py_version_short}/vendor-packages',
        'platlib': '{platbase}/{platlibdir}/python{py_version_short}/vendor-packages',
        'include':
            '{installed_base}/include/python{py_version_short}{abiflags}',
        'platinclude':
            '{installed_platbase}/include/python{py_version_short}{abiflags}',
        'scripts': '{base}/bin',
        'data': '{base}',
    },
}
```

./configure --with-vendor-config=config.py
msg392350 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-04-29 20:23
Any reason this couldn't be in sitecustomize.py? Either by poking values into sysconfig directly (for back-compat) or we train sysconfig to look inside sitecustomize for a well-known name.
msg392367 - (view) Author: Filipe Laíns (FFY00) * Date: 2021-04-30 00:12
Making sysconfig look at sitecustomize seems like the wrong approach. It is behavior I would never expect, and there are use-cases where I still want the schemes to be present when the site module initialization is disabled.

I would also argue that having this mechanism available will be useful for other things.
msg392391 - (view) Author: Miro Hrončok (hroncok) * Date: 2021-04-30 08:24
Cross referencing the discussion: https://discuss.python.org/t/mechanism-for-distributors-to-add-site-install-schemes-to-python-installations/8467
msg392823 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-05-03 16:58
> Making sysconfig look at sitecustomize seems like the wrong approach.

I mean, you're literally customizing the site, so having it be done from sitecustomize doesn't seem terribly wrong. But I agree, I'd rather see the code in sitecustomize poke paths into sysconfig, rather than the other way around.

The problem then would be that -S bypasses the path configuration entirely, which is likely going to point at non-existent paths. So yeah, for this case you need an override that isn't tied to the site module. Having a similar-but-different mechanism in sysconfig seems fine. I have a *slight* preference for non-executable code, mostly to avoid the risk of import hijacking, but it's only slight.
msg392828 - (view) Author: Filipe Laíns (FFY00) * Date: 2021-05-03 18:12
FYI, I have change the implementation to split the extra install schemes and extra schemes activated on site. This still makes sense over sitecustomize because we want the packages to be included in site.getsitepackages -- we want the vendor packages to essentially be the same as site-packages.

I have also moved sysconfig._get_preferred_schemes to the vendor config, instead of asking distributors to patch sysconfig -- this is why I prefer having it as executable code, we customize using functions, etc.
https://docs.python.org/3.10/library/sysconfig.html#sysconfig._get_preferred_schemes

A config taking advantage of all these mechanisms should look like this:

```
EXTRA_INSTALL_SCHEMES = {
    'vendor': {
        'stdlib': '{installed_base}/{platlibdir}/python{py_version_short}',
        'platstdlib': '{platbase}/{platlibdir}/python{py_version_short}',
        'purelib': '{base}/lib/python{py_version_short}/vendor-packages',
        'platlib': '{platbase}/{platlibdir}/python{py_version_short}/vendor-packages',
        'include':
            '{installed_base}/include/python{py_version_short}{abiflags}',
        'platinclude':
            '{installed_platbase}/include/python{py_version_short}{abiflags}',
        'scripts': '{base}/bin',
        'data': '{base}',
    },
}

EXTRA_SITE_INSTALL_SCHEMES = [
    'vendor',
]

def get_preferred_schemes(...):
    ...
```

Do you have any thoughts on this?
msg392832 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-05-03 18:43
Yes, I saw some of the latest changes in the PR.

My biggest concern is with the bare "import _vendor_config", which I'd prefer to have restricted to a fixed location, rather than being influenced by environment variables and other options. We already have an issue with readline being imported from anywhere it can be found.

A native flag to suppress it (i.e. something in sys.flags) could also become important for embedders, though it may matter more at a higher level (i.e. should an embedded CPython *ever* be using sysconfig? Probably not...). I wouldn't add a new flag for it right now, but I feel like sys.flags.isolated should probably imply that this should be ignored.

Though then we hit the issue again that these patches are about changing the "safe default" behaviour, which is what you want to get back when you run with -S or -I. And I'm not totally sure how to resolve this.

So basically, my concerns are:
* don't import arbitrary files
* ensure -S/-I options remain useful (or become even more useful)
msg392884 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-05-04 12:44
Sorry for not getting to this sooner, but 5 days is really tight for such a change.


With -S/-I, It would be great if sys.path only included packages installed as part of the OS, and not those installed by `sudo pip`. (Or `pip --user`, but that's covered).

It seems that with the current patch, pip will install into site-packages and there's no way to disable/change site-packages. Is that the case?
msg392887 - (view) Author: Filipe Laíns (FFY00) * Date: 2021-05-04 12:55
> My biggest concern is with the bare "import _vendor_config", which I'd prefer to have restricted to a fixed location, rather than being influenced by environment variables and other options. We already have an issue with readline being imported from anywhere it can be found.

Oh, I share the same concern! Though users could already mess up Python pretty badly by shadowing/overwriting parts of it, so I didn't thought it would be that big of an issue. Is there a way to achieve this while still allowing us to do everything we want?

> Sorry for not getting to this sooner, but 5 days is really tight for such a change.

No worries. It was my fault, I should have been more attentive to the Python release timeline.

> With -S/-I, It would be great if sys.path only included packages installed as part of the OS, and not those installed by `sudo pip`. (Or `pip --user`, but that's covered).

Perhaps we could add an option to enable only vendor site schemes?

> It seems that with the current patch, pip will install into site-packages and there's no way to disable/change site-packages. Is that the case?

I mean, there is, though not as straightforward as -S/-I. I was planning on using it to build the distro entrypoint scripts, so that they only include the distro packages.

$ python -S
> site.addsitedir(sysconfig.get_path('purelib', 'vendor'))
> site.addsitedir(sysconfig.get_path('platlib', 'vendor'))

As I mentioned above, we could add a cli flag to do essentially the same.
msg392941 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-05-04 20:02
The best option for restricting the import while still having it be a Python import is to find the file (if it's present in the expected location under sys.whatever), and then use importlib to import it: https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly

I'd rather not have a new option here, I would much prefer "-S" in this context to mean "run Python with only core libraries" and "-s" to mean "run Python with only core and distro libraries" (and neither to mean "run Python with core, distro and user libraries").

That may be a bigger change, but there's enough angst around this issue that we would be better off getting it right this time, even if it changes things, than continuing to preserve the system that people dislike so much.
msg392942 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-05-04 20:07
> I'd rather not have a new option here ...

Perhaps what I'm suggesting here is that I don't see any reason for "sudo pip install ..." into a distro-installed Python to ever need to work, and would be quite happy for it to just fail miserably every time (which is already the case for the Windows Store distro of Python).

Admin installed all-user packages is the expert scenario here, and can be as twisted as possible. Pip installed per-user packages and system-tool installed packages are the defaults, and the more easily those can be overridden by a file in the distro, the better.
msg392946 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-05-04 20:23
On 04.05.2021 22:07, Steve Dower wrote:
> 
> Perhaps what I'm suggesting here is that I don't see any reason for "sudo pip install ..." into a distro-installed Python to ever need to work, and would be quite happy for it to just fail miserably every time (which is already the case for the Windows Store distro of Python).

The "pip install" into a root environment approach is the standard way
to setup Docker (and similar) containers, so I think trying to break
this on purpose will not do Python a good service.

The pip warning about this kind of setup which apparently got added
in one of the more recent versions of pip already is causing a lot
of unnecessary noise when building containers and doesn't make Python
look good in that environment.
msg392948 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-05-04 20:29
Would "pip install --user ..." in a Docker container also work, though? Presumably all the filesystem paths are being redirected anyway, so is there a difference?

(My assumption is that "--user" would essentially become the default if you're using the OS provided pip/Python. If you do your own build/install of it then you obviously get "default" behaviour, for better or worse.)
msg392950 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-05-04 20:52
On 04.05.2021 22:29, Steve Dower wrote:
> 
> Would "pip install --user ..." in a Docker container also work, though? Presumably all the filesystem paths are being redirected anyway, so is there a difference?
> 
> (My assumption is that "--user" would essentially become the default if you're using the OS provided pip/Python. If you do your own build/install of it then you obviously get "default" behaviour, for better or worse.)

More modern Docker setups run the application itself under a non-root
user, but still install the packages and other dependencies as root.

See eg. Zammad's Dockerfile:
https://github.com/zammad/zammad-docker/blob/master/Dockerfile

Not sure whether that answers your question, though.

It's rather uncommon to install venvs inside Docker containers: one of the
main reasons for using containers is the added isolation, but it doesn't
make a lot of sense to add another layer of isolation inside the container.

"pip install as root" will need to continue to work and thus distros
need to get a way to make sure that it doesn't corrupt the system
installed packages. And perhaps distros can also patch pip to not
output those silly warnings anymore when using the system pip package :-)

Regarding the proposed solution: I'm not sure whether a new configure
option is the right way to go about this. Distros could simply patch
sysconfig.py, since that's the golden source of this information from
Python 3.10 onward.

setuptools' distutils version (and other packages which ship distutils)
will have to use this information instead of the copy which is/was
backed into distutils/sysconfig.py on Python 3.10+
msg392951 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-05-04 20:58
> "pip install as root" will need to continue to work and thus distros
> need to get a way to make sure that it doesn't corrupt the system
> installed packages

Excuse my ignorance, but does "as root" imply that there's no user site-packages directory at all?

I'm not imagining a solution that doesn't require *users* to change their commands, so if they're currently running "sudo pip install" because they need to, but we change it so they shouldn't, then I'm okay with them having to remove the "sudo". (At least for this discussion - we can evaluate transition plans separately.)

And yeah, patching sysconfig.py seems easier. But then, adding a file to the distro is even easier, and if it's easiest for Linux distros to do that via configure than to add a copy step into their build (which is how I'll do it for Windows distros that need it), then I'll leave that to others to decide/implement.
msg392984 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-05-05 07:54
On 04.05.2021 22:58, Steve Dower wrote:
>> "pip install as root" will need to continue to work and thus distros
>> need to get a way to make sure that it doesn't corrupt the system
>> installed packages
> 
> Excuse my ignorance, but does "as root" imply that there's no user site-packages directory at all?

Why should there be no site-packages dir ? All non-core packages get
installed into site-packages (or a similar dir which holds such packages)
by distutils / setuptools.

However, distros usually split this up further into packages which are
managed by the distro packager and ones which are managed by distutils /
setuptools and this is why the install schemes need to be patched.

> I'm not imagining a solution that doesn't require *users* to change their commands, so if they're currently running "sudo pip install" because they need to, but we change it so they shouldn't, then I'm okay with them having to remove the "sudo". (At least for this discussion - we can evaluate transition plans separately.)

I'm not sure I understand what you're suggesting.

For Docker, the instructions from the Dockerfile are run as root, so
there is no sudo involved. Whether you use sudo or not or how pip is
invoked is really not relevant for the discussion. The main point is
that the target of the installation is the system installation, not
a local user installation or a venv. That installation layout is what
sysconfig.py defines in the install schemes.

> And yeah, patching sysconfig.py seems easier. But then, adding a file to the distro is even easier, and if it's easiest for Linux distros to do that via configure than to add a copy step into their build (which is how I'll do it for Windows distros that need it), then I'll leave that to others to decide/implement.

You mean: put something like...

from _sysconfig_site import *
install_sysconfig_site()

at the end of sysconfig.py and then have distros add a
_sysconfig_site module ?

That would work as well, but details will have to be hashed out, since
this can be abused to hijack the system installation of Python (python
-S would have no effect on this). Patching sysconfig.py is definitely
safer.
msg392986 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-05-05 08:01
> "as root" imply that there's no user site-packages directory at all
                                 ^^^^^      

Steve is talking about user site-packages, not global site-packages directory.
msg392987 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-05-05 08:16
On 05.05.2021 10:01, Christian Heimes wrote:
> 
>> "as root" imply that there's no user site-packages directory at all
>                                  ^^^^^      
> 
> Steve is talking about user site-packages, not global site-packages directory.

You mean "pip install --user" as root ? That's not how you typically
install Python packages as root in a Dockerfile, no, but, of course,
even as root, there is the possibility to install into /root/.local/.

The typical Unix way of installing non-system packages is either
into /usr/local, /opt/local or similar variants, not into /usr.
Python itself also defaults to /usr/local when running
"make install". System provided packages normally live
under /usr (or even directly under / for low level tools).

As a root user, I'd assume that "pip install" also installs into
a /usr/local based site-packages dir -- and that's what happens
at least on Debian based OSes. But it can only happen because
the distros patch the install scheme, since this would normally
install into the /usr based site-packages dir for a python binary
living in /usr/bin.
msg392989 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-05-05 08:29
I mean that Steve and you are talking about different things.

Neither Steve nor you or I are are Linux distro packaging experts. I suggest that we listen to the expertise of downstream packagers like Filipe or Miro. They deal with packaging on a daily basis.

By the way you are assuming that all container solutions work like Docker and that all Docker and non-Docker based container solutions allow you to run code as unrestricted, unconfined root. That's a) a incorrect, and b) offtopic for this ticket.
msg392994 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-05-05 09:04
On 05.05.2021 10:29, Christian Heimes wrote:
> 
> I mean that Steve and you are talking about different things.

Could be. I was addressing the point Steve made about not allowing
or making it hard to run "pip install" as root user.

> Neither Steve nor you or I are are Linux distro packaging experts. I suggest that we listen to the expertise of downstream packagers like Filipe or Miro. They deal with packaging on a daily basis.

Agreed.

> By the way you are assuming that all container solutions work like Docker and that all Docker and non-Docker based container solutions allow you to run code as unrestricted, unconfined root. That's a) a incorrect, and b) offtopic for this ticket.

I gave the Docker example as proof that running "pip install" as
root is a rather common scenario and needs to be supported.

Linux distros have been supporting this for many years and just
because distutils is deprecated should not mean that we no longer
provide ways to support this kind of setup.

BTW: I'm aware that other container solutions work in different ways,
e.g. Podman, LXC, etc. but I have yet to find a solution that doesn't
offer root permissions inside the containers (I'm not talking
about how the container is run in the host system).
msg393022 - (view) Author: Filipe Laíns (FFY00) * Date: 2021-05-05 16:35
We cannot change how `sudo pip install` fundamentally works because there are too many people depending on it, and even if we could, this is not the place :P

I think we went a little off-topic here, so let's get back to the discussion.

> The best option for restricting the import while still having it be a Python import is to find the file (if it's present in the expected location under sys.whatever), and then use importlib to import it: https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly

Right, though that requires also a new import, importlib, which may not be optimal. Considering that this module is meant to be private and basically all other private importable parts of Python suffer from the same issue, I am finding it hard to justify. If there's enough consensus that this approach would be better, I am more than happy to change the implementation.

> I'd rather not have a new option here, I would much prefer "-S" in this context to mean "run Python with only core libraries" and "-s" to mean "run Python with only core and distro libraries" (and neither to mean "run Python with core, distro and user libraries").

I don't think having an option to start Python with only the vendor modules would be *necessary*, though it would certainly be helpful. Among other things, it would be super helpful to be able to tell users to run Python with the -D (made up) option to isolate issues with the vendor modules and the user Python environment.

> That may be a bigger change, but there's enough angst around this issue that we would be better off getting it right this time, even if it changes things, than continuing to preserve the system that people dislike so much.

This may be completely wrong for other people, but is my understanding. AFAIK those these issues come from lack of separation between the distro, system and user environments, causing a hell of conflicts and silent module shadowing that neither the system package manager or pip can fix. Almost every time I help people with Python I have to tell them to use a virtual env, which most people aren't expecting, and would likely run into issues had I not suggested it.
Considering that, I think this approach, including the CLI option, would be a step forward. How big would that step be, I am not sure, but probably not *that* big.

But yeah, this is, of course, my experience, and that can vary for other people, so there may be different perspectives here. So I'd very much like to hear other people on this.
msg393041 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-05-05 20:44
>> The best option for restricting the import while still having it be a Python
>> import is to find the file (if it's present in the expected location under
>> sys.whatever), and then use importlib to import it:
>> https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly
>
> Right, though that requires also a new import, importlib, which may not be
> optimal. Considering that this module is meant to be private and basically all
> other private importable parts of Python suffer from the same issue, I am
> finding it hard to justify. If there's enough consensus that this approach would
> be better, I am more than happy to change the implementation.

Another alternative would be to convert sysconfig into a directory and make the vendor patch a submodule. That's _very slightly_ more impactful for the unpatched case, but only really for scenarios where people are trying to do things they shouldn't. Or we can include the file in all distros and import it earlier (before taking environment variables, etc. into account).

In my opinion, the security implications alone suggest we shouldn't be importing this by name without knowing where it is coming from.

>> I'd rather not have a new option here, I would much prefer "-S" in this
>> context to mean "run Python with only core libraries" and "-s" to mean "run
>> Python with only core and distro libraries" (and neither to mean "run Python
>> with core, distro and user libraries").
>
> I don't think having an option to start Python with only the vendor modules
> would be *necessary*, though it would certainly be helpful. Among other things,
> it would be super helpful to be able to tell users to run Python with the -D
> (made up) option to isolate issues with the vendor modules and the user Python
> environment.

But the user can already exclude their user-installed packages with -s, right? It's the site-installed packages that would require -S, but that also excludes vendor modules.

Why do we encourage users to install site-wide packages using pip? Why is it such an important scenario for a distro-provided Python to be able to modify its global install using non-distro-provided tools and non-distro-provided packages? What's wrong with saying "install for --user", or else "apt install some-different-python-bundle" first and use that?

(To be clear, I'm framing these as confrontational questions to help my understanding. I'm totally willing to accept an answer of "just because", provided whoever is giving that answer actually "owns" dealing with the fallout.)
msg393044 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-05-05 21:30
Steve: I think the point of discussing whether "pip install" can
be used to manage system wide packages is moot. It's been like that
for ages, not only for pip, but also for the distutils setup.py install
process and the old Makefile.pre.in approach before that. People
have their reasons, it's what you'd expect to work as a Unix sysadmin
and won't go away anytime soon :-)

So back to the original point...

Filipe: Could you please explain why patching sysconfig.py is not a
long term solution ?

This doesn't involve any changes on the CPython side, is as flexible
as you can get (you can also patch functions defined in sysconfig.py
to do the necessary magic, not only provide a static dict),
doesn't create overhead for Python's startup, works with all the
different command line options for limiting sys.path additions and
avoids security issues with the Python import logic.

It's already clear that sysconfig.py will be the new golden source
for installation related APIs and schemes (perhaps this could be
made even clearer in the docs), so 3rd party packages will adapt
to this once 3.10 is out.
History
Date User Action Args
2021-05-05 21:30:34lemburgsetmessages: + msg393044
2021-05-05 20:44:12steve.dowersetmessages: + msg393041
2021-05-05 16:35:34FFY00setmessages: + msg393022
2021-05-05 09:04:02lemburgsetmessages: + msg392994
2021-05-05 08:29:22christian.heimessetmessages: + msg392989
2021-05-05 08:16:59lemburgsetmessages: + msg392987
2021-05-05 08:01:46christian.heimessetnosy: + christian.heimes
messages: + msg392986
2021-05-05 07:54:00lemburgsetmessages: + msg392984
2021-05-04 20:58:06steve.dowersetmessages: + msg392951
2021-05-04 20:52:32lemburgsetmessages: + msg392950
2021-05-04 20:29:31steve.dowersetmessages: + msg392948
2021-05-04 20:23:08lemburgsetnosy: + lemburg
messages: + msg392946
2021-05-04 20:07:37steve.dowersetmessages: + msg392942
2021-05-04 20:02:01steve.dowersetmessages: + msg392941
2021-05-04 12:55:21FFY00setmessages: + msg392887
2021-05-04 12:44:52petr.viktorinsetnosy: + petr.viktorin
messages: + msg392884
2021-05-03 18:43:58steve.dowersetmessages: + msg392832
2021-05-03 18:12:28FFY00setmessages: + msg392828
2021-05-03 16:58:11steve.dowersetmessages: + msg392823
2021-04-30 22:48:36terry.reedysettitle: Introduce mechanism to allow Python distributors to add custom site install schemes -> Allow Python distributors to add custom site install schemes
2021-04-30 08:24:26hroncoksetnosy: + hroncok
messages: + msg392391
2021-04-30 05:59:05frenzysetnosy: + frenzy
2021-04-30 00:12:35FFY00setmessages: + msg392367
2021-04-29 20:23:03steve.dowersetnosy: + steve.dower
messages: + msg392350
2021-04-29 16:19:02FFY00create