classification
Title: Environment variable to set alternate location for pycache tree
Type: enhancement Stage: patch review
Components: Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, brett.cannon, carljm, eric.snow, lukasz.langa, ncoghlan, rhettinger
Priority: normal Keywords: patch

Created on 2018-05-14 15:12 by carljm, last changed 2018-05-19 02:46 by carljm.

Pull Requests
URL Status Linked Edit
PR 6834 open carljm, 2018-05-14 21:52
Messages (13)
msg316518 - (view) Author: Carl Meyer (carljm) * Date: 2018-05-14 15:12
We would like to set an environment variable that would cause Python to read and write `__pycache__` directories from a separate location on the filesystem (outside the source code tree). We have two reasons for this:

1. In our development setup (with a webserver running in a container on the dev-tree code), the `__pycache__` directories end up root-owned, and managing permissions on them so that they don't disrupt VCS operations on the code repo is untenable. (Currently we use PYTHONDONTWRITEBYTECODE as a workaround, but we have enough code that this costs us multiple seconds of developer time on every restart; we'd like to take advantage of cached bytecode without requiring that it pollute the code tree.)

2. In addition to just _having_ cached bytecode, we'd like to keep it on a ramdisk to minimize filesystem overhead.

Proposal: a `PYTHON_BYTECODE_PATH` environment variable. If set, `source_from_cache` and `cache_from_source` in `importlib._bootstrap_external` will respect it, creating a directory tree under that prefix that mirrors the source tree.
msg316583 - (view) Author: Carl Meyer (carljm) * Date: 2018-05-14 21:27
Per vstinner Python prefers to not have underscores in environment variable names, for historical reasons. So I'm using `PYTHONBYTECODEPATH` as the env var.

Other open questions: 

1) Does there need to be a corresponding CLI flag, or is env-var-only sufficient?

2) Is it OK to check the environ every time, or do we need to cache its value in a `sys` flag at startup?

Will push an initial version for review that has no CLI flag nor `sys` attribute.
msg316748 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-05-16 00:13
FWIW, I've had issues with environment variables in that they affect every version of Python running on a system and seem to defy isolation.  So, if one application needs the environment variable set, it will affect every application, even if it wants to keeps its contents private and not leak outside of a virtual environment.

Can your needs be met with just CLI flag rather than a system-wide environment variable?
msg316758 - (view) Author: Carl Meyer (carljm) * Date: 2018-05-16 02:58
Environment variable seems to make a bit more sense for this, since it's not per-invocation; there's no point writing bytecode cache to a particular location unless the next invocation reads the cache from there.

Our use case includes a webserver process that embeds Python; I'm not sure if we could pass a CLI arg to it or not.

Python has lots of precedent for similar environment variables (e.g. `PYTHONHOME`, `PYTHONDONTWRITEBYTECODE`, `PYTHONPATH`, etc). Compared to those, `PYTHONBYTECODEPATH` is pretty much harmless if it "leaks" to an unintended process.

I asked Brett Cannon in the sprints if I should add a CLI flag in addition to the env var; he suggested it wasn't worth it. I'm not opposed to adding the CLI flag, but I think removing the env var option would be a mistake.
msg316759 - (view) Author: Carl Meyer (carljm) * Date: 2018-05-16 02:59
> a system-wide environment variable

Environment variables aren't system-wide, they are per-process (though they can be inherited by child processes).
msg316874 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-05-16 22:16
> Environment variables aren't system-wide, they are 
> per-process (though they can be inherited by child processes).

Yes, that is how they work.  It is not how they are used.  Environment variables are commonly set in shell start-up scripts such as .bashrc and the results then affect every python application that gets run in any shell session.
msg316875 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-05-16 22:19
On May 15, 2018, at 22:58, Carl Meyer <report@bugs.python.org> wrote:

> Our use case includes a webserver process that embeds Python; I'm not sure if we could pass a CLI arg to it or not.

I think you pretty much have to have an environment variable, as there are just too many places where you’re going to invoke Python without the ability to set the command line.  We have precedence for having both a switch and environment variable, and I think that makes sense here.
msg316917 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-05-17 12:14
Regarding environment variables, note that they get used in two *very* different ways:

1. The "persistent shell setting" case that Raymond describes. While setting PYTHONBYTECODEPATH to always point to a RAM disk could make quite a bit of sense for some developers, it's more likely that this case would be associated with tools like `pipenv shell`.

2. The "inheritable process setting" case, where you prepend the environment variable setting to a shell command, or add it to the env dict in a Python subprocess call.

Anywhere that I used this setting, I'd want it to be passed along to child processes, so an environment variable would be a lot more useful than a command line option.

If we did add an option, then a named -X option would probably make the most sense.

Regarding the state caching: having this be read once at startup would help avoid a lot of potential for weird state inconsistencies where some modules were loaded from one cache directory, while later modules were loaded from a different one.
msg316925 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-05-17 13:16
On May 17, 2018, at 08:14, Nick Coghlan <report@bugs.python.org> wrote:
> 
> If we did add an option, then a named -X option would probably make the most sense.

+1
msg316954 - (view) Author: Carl Meyer (carljm) * Date: 2018-05-17 16:10
Can we have a named -X option that also takes a parameter? I don't see any existing examples of that. This option needs to take the path where bytecode should be written.

Are there strong use-cases for having a CLI arg for this? I don't mind doing the implementation work if there are, but right now I'm struggling to think of any case where it would be better to run `python -C /tmp/bytecode` than `PYTHONBYTECODEPATH=/tmp/bytecode python`. Our existing "takes a path" env variables (`PYTHONHOME` and `PYTHONPATH`) do not have CLI equivalents.
msg316958 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-05-17 16:26
Honestly, I don't think there's a strong argument for a CLI option.  I'm perfectly happy with just an environment variable.
msg316959 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-05-17 16:27
I believe the main argument for -X options is the fact that cmd on Windows doesn't offer a nice way of setting environment variables as part of the command invocation (hence "-X utf8", for example).

As far as setting values for X options goes, `sys._xoptions` in CPython is a str:Union[bool,str] dict, with the command args split on "=":

$ python3 -X arg=value -c "import sys; print(sys._xoptions)"                                                                                                                  
{'arg': 'value'}

If no value is given for the arg, then it's just set to the boolean True.

The _xoptions entry shouldn't be the public API though - it's just a way of shuttling settings from the command line through to CPython-specific initialisation code.
msg317092 - (view) Author: Carl Meyer (carljm) * Date: 2018-05-19 02:46
Cool, thanks for the pointer on -X. PR is updated with `-X bytecode_path=PATH`; don't think it's critical to have it, but it wasn't that hard to add.
History
Date User Action Args
2018-05-19 02:46:19carljmsetmessages: + msg317092
2018-05-17 16:27:27ncoghlansetmessages: + msg316959
2018-05-17 16:26:51barrysetmessages: + msg316958
2018-05-17 16:10:52carljmsetmessages: + msg316954
2018-05-17 13:16:00barrysetmessages: + msg316925
2018-05-17 12:14:04ncoghlansetmessages: + msg316917
2018-05-16 22:19:54barrysetmessages: + msg316875
2018-05-16 22:16:31rhettingersetmessages: + msg316874
2018-05-16 02:59:45carljmsetmessages: + msg316759
2018-05-16 02:58:04carljmsetmessages: + msg316758
2018-05-16 00:13:58rhettingersetnosy: + rhettinger
messages: + msg316748
2018-05-15 21:45:51lukasz.langasetversions: + Python 3.8
2018-05-14 21:52:23carljmsetkeywords: + patch
stage: patch review
pull_requests: + pull_request6517
2018-05-14 21:27:16carljmsetmessages: + msg316583
2018-05-14 15:41:01barrysetnosy: + barry
2018-05-14 15:12:49carljmcreate