Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Environment variable to set alternate location for pycache tree #77680

Closed
carljm opened this issue May 14, 2018 · 17 comments
Closed

Environment variable to set alternate location for pycache tree #77680

carljm opened this issue May 14, 2018 · 17 comments
Labels
3.8 only security fixes type-feature A feature request or enhancement

Comments

@carljm
Copy link
Member

carljm commented May 14, 2018

BPO 33499
Nosy @warsaw, @brettcannon, @rhettinger, @ncoghlan, @vstinner, @carljm, @ambv, @ericsnowcurrently
PRs
  • bpo-33499: Add PYTHONPYCACHEPREFIX env var for alt bytecode cache location. #6834
  • bpo-33499: PYTHONPYCACHEPREFIX What's New entry #7749
  • bpo-33499: Fix pymain_init_pycache_prefix() #8596
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-06-16.04:45:26.184>
    created_at = <Date 2018-05-14.15:12:49.618>
    labels = ['type-feature', '3.8']
    title = 'Environment variable to set alternate location for pycache tree'
    updated_at = <Date 2018-08-01.14:16:51.244>
    user = 'https://github.com/carljm'

    bugs.python.org fields:

    activity = <Date 2018-08-01.14:16:51.244>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-06-16.04:45:26.184>
    closer = 'ncoghlan'
    components = []
    creation = <Date 2018-05-14.15:12:49.618>
    creator = 'carljm'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33499
    keywords = ['patch']
    message_count = 17.0
    messages = ['316518', '316583', '316748', '316758', '316759', '316874', '316875', '316917', '316925', '316954', '316958', '316959', '317092', '319713', '319714', '320070', '322864']
    nosy_count = 8.0
    nosy_names = ['barry', 'brett.cannon', 'rhettinger', 'ncoghlan', 'vstinner', 'carljm', 'lukasz.langa', 'eric.snow']
    pr_nums = ['6834', '7749', '8596']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue33499'
    versions = ['Python 3.8']

    @carljm
    Copy link
    Member Author

    carljm commented May 14, 2018

    We would like to set an environment variable that would cause Python to read and write __pycache__ directories from a separate location on the filesystem (outside the source code tree). We have two reasons for this:

    1. In our development setup (with a webserver running in a container on the dev-tree code), the __pycache__ directories end up root-owned, and managing permissions on them so that they don't disrupt VCS operations on the code repo is untenable. (Currently we use PYTHONDONTWRITEBYTECODE as a workaround, but we have enough code that this costs us multiple seconds of developer time on every restart; we'd like to take advantage of cached bytecode without requiring that it pollute the code tree.)

    2. In addition to just _having_ cached bytecode, we'd like to keep it on a ramdisk to minimize filesystem overhead.

    Proposal: a PYTHON_BYTECODE_PATH environment variable. If set, source_from_cache and cache_from_source in importlib._bootstrap_external will respect it, creating a directory tree under that prefix that mirrors the source tree.

    @carljm carljm added the type-feature A feature request or enhancement label May 14, 2018
    @carljm
    Copy link
    Member Author

    carljm commented May 14, 2018

    Per vstinner Python prefers to not have underscores in environment variable names, for historical reasons. So I'm using PYTHONBYTECODEPATH as the env var.

    Other open questions:

    1. Does there need to be a corresponding CLI flag, or is env-var-only sufficient?

    2. Is it OK to check the environ every time, or do we need to cache its value in a sys flag at startup?

    Will push an initial version for review that has no CLI flag nor sys attribute.

    @ambv ambv added the 3.8 only security fixes label May 15, 2018
    @rhettinger
    Copy link
    Contributor

    FWIW, I've had issues with environment variables in that they affect every version of Python running on a system and seem to defy isolation. So, if one application needs the environment variable set, it will affect every application, even if it wants to keeps its contents private and not leak outside of a virtual environment.

    Can your needs be met with just CLI flag rather than a system-wide environment variable?

    @carljm
    Copy link
    Member Author

    carljm commented May 16, 2018

    Environment variable seems to make a bit more sense for this, since it's not per-invocation; there's no point writing bytecode cache to a particular location unless the next invocation reads the cache from there.

    Our use case includes a webserver process that embeds Python; I'm not sure if we could pass a CLI arg to it or not.

    Python has lots of precedent for similar environment variables (e.g. PYTHONHOME, PYTHONDONTWRITEBYTECODE, PYTHONPATH, etc). Compared to those, PYTHONBYTECODEPATH is pretty much harmless if it "leaks" to an unintended process.

    I asked Brett Cannon in the sprints if I should add a CLI flag in addition to the env var; he suggested it wasn't worth it. I'm not opposed to adding the CLI flag, but I think removing the env var option would be a mistake.

    @carljm
    Copy link
    Member Author

    carljm commented May 16, 2018

    a system-wide environment variable

    Environment variables aren't system-wide, they are per-process (though they can be inherited by child processes).

    @rhettinger
    Copy link
    Contributor

    Environment variables aren't system-wide, they are
    per-process (though they can be inherited by child processes).

    Yes, that is how they work. It is not how they are used. Environment variables are commonly set in shell start-up scripts such as .bashrc and the results then affect every python application that gets run in any shell session.

    @warsaw
    Copy link
    Member

    warsaw commented May 16, 2018

    On May 15, 2018, at 22:58, Carl Meyer <report@bugs.python.org> wrote:

    Our use case includes a webserver process that embeds Python; I'm not sure if we could pass a CLI arg to it or not.

    I think you pretty much have to have an environment variable, as there are just too many places where you’re going to invoke Python without the ability to set the command line. We have precedence for having both a switch and environment variable, and I think that makes sense here.

    @ncoghlan
    Copy link
    Contributor

    Regarding environment variables, note that they get used in two *very* different ways:

    1. The "persistent shell setting" case that Raymond describes. While setting PYTHONBYTECODEPATH to always point to a RAM disk could make quite a bit of sense for some developers, it's more likely that this case would be associated with tools like pipenv shell.

    2. The "inheritable process setting" case, where you prepend the environment variable setting to a shell command, or add it to the env dict in a Python subprocess call.

    Anywhere that I used this setting, I'd want it to be passed along to child processes, so an environment variable would be a lot more useful than a command line option.

    If we did add an option, then a named -X option would probably make the most sense.

    Regarding the state caching: having this be read once at startup would help avoid a lot of potential for weird state inconsistencies where some modules were loaded from one cache directory, while later modules were loaded from a different one.

    @warsaw
    Copy link
    Member

    warsaw commented May 17, 2018

    On May 17, 2018, at 08:14, Nick Coghlan <report@bugs.python.org> wrote:

    If we did add an option, then a named -X option would probably make the most sense.

    +1

    @carljm
    Copy link
    Member Author

    carljm commented May 17, 2018

    Can we have a named -X option that also takes a parameter? I don't see any existing examples of that. This option needs to take the path where bytecode should be written.

    Are there strong use-cases for having a CLI arg for this? I don't mind doing the implementation work if there are, but right now I'm struggling to think of any case where it would be better to run python -C /tmp/bytecode than PYTHONBYTECODEPATH=/tmp/bytecode python. Our existing "takes a path" env variables (PYTHONHOME and PYTHONPATH) do not have CLI equivalents.

    @warsaw
    Copy link
    Member

    warsaw commented May 17, 2018

    Honestly, I don't think there's a strong argument for a CLI option. I'm perfectly happy with just an environment variable.

    @ncoghlan
    Copy link
    Contributor

    I believe the main argument for -X options is the fact that cmd on Windows doesn't offer a nice way of setting environment variables as part of the command invocation (hence "-X utf8", for example).

    As far as setting values for X options goes, sys._xoptions in CPython is a str:Union[bool,str] dict, with the command args split on "=":

    $ python3 -X arg=value -c "import sys; print(sys._xoptions)"                                                                                                                  
    {'arg': 'value'}

    If no value is given for the arg, then it's just set to the boolean True.

    The _xoptions entry shouldn't be the public API though - it's just a way of shuttling settings from the command line through to CPython-specific initialisation code.

    @carljm
    Copy link
    Member Author

    carljm commented May 19, 2018

    Cool, thanks for the pointer on -X. PR is updated with -X bytecode_path=PATH; don't think it's critical to have it, but it wasn't that hard to add.

    @ncoghlan
    Copy link
    Contributor

    New changeset b193fa9 by Nick Coghlan (Carl Meyer) in branch 'master':
    bpo-33499: Add PYTHONPYCACHEPREFIX env var for alt bytecode cache location. (GH-6834)
    b193fa9

    @ncoghlan
    Copy link
    Contributor

    Merged as PYTHONPYCACHEPREFIX=path, -X pycache_prefix=path and sys.pycache_prefix :)

    I'll also update PEP-304 with a note saying a variant of the idea was eventually accepted for Python 3.8.

    @ncoghlan
    Copy link
    Contributor

    New changeset 16eb3bc by Nick Coghlan in branch 'master':
    bpo-33499: PYTHONPYCACHEPREFIX What's New entry (GH-7749)
    16eb3bc

    @vstinner
    Copy link
    Member

    vstinner commented Aug 1, 2018

    New changeset fc96437 by Victor Stinner in branch 'master':
    bpo-33499: Fix pymain_init_pycache_prefix() (GH-8596)
    fc96437

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants