Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for settting umask in subprocess children #82598

Closed
gpshead opened this issue Oct 8, 2019 · 13 comments
Closed

Add support for settting umask in subprocess children #82598

gpshead opened this issue Oct 8, 2019 · 13 comments
Assignees
Labels
3.9 only security fixes type-feature A feature request or enhancement

Comments

@gpshead
Copy link
Member

gpshead commented Oct 8, 2019

BPO 38417
Nosy @gpshead, @vstinner, @tiran
PRs
  • bpo-38417: Add umask support to subprocess #16726
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2019-10-12.20:26:18.607>
    created_at = <Date 2019-10-08.22:55:39.652>
    labels = ['type-feature', '3.9']
    title = 'Add support for settting umask in subprocess children'
    updated_at = <Date 2019-10-12.20:26:18.604>
    user = 'https://github.com/gpshead'

    bugs.python.org fields:

    activity = <Date 2019-10-12.20:26:18.604>
    actor = 'gregory.p.smith'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2019-10-12.20:26:18.607>
    closer = 'gregory.p.smith'
    components = []
    creation = <Date 2019-10-08.22:55:39.652>
    creator = 'gregory.p.smith'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38417
    keywords = ['patch']
    message_count = 13.0
    messages = ['354238', '354239', '354240', '354242', '354293', '354326', '354328', '354331', '354333', '354334', '354398', '354552', '354553']
    nosy_count = 3.0
    nosy_names = ['gregory.p.smith', 'vstinner', 'christian.heimes']
    pr_nums = ['16726']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue38417'
    versions = ['Python 3.9']

    @gpshead
    Copy link
    Member Author

    gpshead commented Oct 8, 2019

    Another use of the deprecated unsafe preexec_fn was to call os.umask in the child prior to exec.

    As seen in freeipa/freeipa#3769 (see the code in there).

    We should add an explicit feature for this to avoid people's desire for preexec_fn or for the heavyweight workaround of an intermediate shell calling umask before doing another exec.

    Any common preexec_fn uses that we can encode into supported parameters will help our ability to remove the ill fated preexec_fn misfeature in the future.

    @gpshead gpshead added 3.9 only security fixes type-feature A feature request or enhancement labels Oct 8, 2019
    @vstinner
    Copy link
    Member

    vstinner commented Oct 8, 2019

    We should add an explicit feature for this

    If we need to write a wrapper program for that, I would say that no, we don't "have to" provide something in the stdlib.

    In OpenStack, I wrote prlimit.py which is a preexec-like wrapper program to apply resource limits when calling a program. It's just a pure Python implementation of the Unix prlimit program. The Python implementation is used when the prlimit progrma is not available.

    https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/prlimit.py

    IMHO it's perfectly fine to explain that a wrapper program is needed to implement preexec-like features.

    @gpshead
    Copy link
    Member Author

    gpshead commented Oct 8, 2019

    We don't have to for all possible things, doing this just reduced friction for existing uses. In this case, calling umask in our child ourselves would be easy to support. (easier than the more important setuid/sid/gid/groups-ish stuff that recently went in)

    I'm trying to make sure we track what is blocking people from getting rid of preexec_fn in their existing code so that we can actually deprecate and get rid of the API entirely.

    @vstinner
    Copy link
    Member

    vstinner commented Oct 9, 2019

    I'm trying to make sure we track what is blocking people from getting rid of preexec_fn in their existing code so that we can actually deprecate and get rid of the API entirely.

    If you consider posix_spawn(), I think that a convenient replacement for preexec_fn function would be a wrapper process which would execute *arbitrary Python code* before spawning the program.

    It would not only cover umask case, but also prlimit, and another other custom code.

    Pseudo-code of the wrapper:

      import sys
      code = sys.argv[1]
      argv = sys.argv[2:]
      eval(code)
      os.execv(argv[0], argv)

    The main risk is that the arbitrary code could create an inheritable file descriptor (not all C extensions respect the PEP-446) which would survive after replacing the process memory with the new program.

    Such design would allow to implement it in a third party package (on PyPI) for older Python versions as well.

    --

    Currently, preexec_fn is a direct reference to a callable Python object in the current process. preexec_fn calls it just after fork().

    Here I'm talking about running arbitrary Python code in a freshly spawned Python process. It's different.

    --

    The new problems are:

    • How to ensure that Python is configured as expected? Use -I command line option? Use -S to not import the site module?
    • How to report a Python exception from the child to the parent process? Build a pipe between the two processes and serialize the exception, as we are already doing for preexec_fn?
    • How to report os.execv() failure to the parent? Just something like sys.exit(OSErrno.errno)?
    • Should close_fds be implemented in the wrapper as well? If yes, can the parent avoid closing file descriptors?

    https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/prlimit.py

    This wrapper uses os.execv().

    @gpshead
    Copy link
    Member Author

    gpshead commented Oct 9, 2019

    We should not provide such an "run arbitrary python code before execing the ultimate child" feature in the standard library.

    It is complicated, and assumes you have an ability to execute a new interpreter with its own slow startup time as an intermediate in the child in the first place. (embedded pythons do not have that ability)

    Leave that to someone to implement on top of subprocess as a thing on PyPI.

    @vstinner
    Copy link
    Member

    Another use of the deprecated unsafe preexec_fn was to call os.umask in the child prior to exec.

    What do you mean by "deprecated"? The parameter doesn't seem to be deprecated in the documentation:
    https://docs.python.org/dev/library/subprocess.html#subprocess.Popen

    And when I use it, it doesn't emit any warning:
    ---

    import os, subprocess, sys
    def func(): print(f"func called in {os.getpid()}")
    argv = [sys.executable, "-c", "pass"]
    print(f"parent: {os.getpid()}")
    subprocess.run(argv, preexec_fn=func)

    Output:
    ---

    $ ./python -Werror x.py 
    parent: 22264
    func called in 22265

    If you want to deprecate it, it should be documented as deprecated and emit a DeprecatedWarning, no?

    @vstinner
    Copy link
    Member

    pylint emits a warning on subprocess.Popen(preexec_fn=func):

    W1509: Using preexec_fn keyword which may be unsafe in
    the presence of threads (subprocess-popen-preexec-fn)

    But not when using subprocess.run(preexec_fn=func). Maybe a check is missing in pylint.

    Note: pyflakes doesn't complain about preexec_fn.

    @vstinner
    Copy link
    Member

    We should not provide such an "run arbitrary python code before execing the ultimate child" feature in the standard library.

    Do you want to modify _posixsubprocess to call umask() between fork() and exec()? As it has been done for uid, gid and groups: call setreuid(), setregid() and setgroups()?

    If so, it means that posix_spawn() couldn't be used when the new umask parameter would be used, right?

    @tiran
    Copy link
    Member

    tiran commented Oct 10, 2019

    preexec_fn does not work in subinterpreters, which (amongst others) affects mod_wsgi and similar WSGI implementations. Therefore portable software must not use preexec_fn at all.

    @tiran
    Copy link
    Member

    tiran commented Oct 10, 2019

    Changed in version 3.8: The preexec_fn parameter is no longer supported in subinterpreters. The use of the parameter in a subinterpreter raises RuntimeError. The new restriction may affect applications that are deployed in mod_wsgi, uWSGI, and other embedded environments.

    @gpshead
    Copy link
    Member Author

    gpshead commented Oct 10, 2019

    preexec_fn has been mentally and advisability deprecated for years. :)
    I'll mark it officially with pending deprecation in 3.9 destined to be removed in 3.11. tracking that in its own rollup issue https://bugs.python.org/issue38435

    As far as posix_spawn goes, I expect these kinds of between fork+exec features to be something that prevents posix_spawn from being usable. As are many other things. People who want to use posix_spawn will need to know that and seek to avoid those. That's a documentation issue first and foremost. Our existing POpen API is very old and wasn't designed to make that nice.

    A new API could be made that *only* supports posix_spawn available features if you want an entrypoint that encourages the generally lower latency posix_spawn path. (A subprocess.spawn function similar to subprocess.run perhaps?) That should be taken up within its own enhancement issue.

    @gpshead gpshead self-assigned this Oct 12, 2019
    @gpshead
    Copy link
    Member Author

    gpshead commented Oct 12, 2019

    New changeset f3751ef by Gregory P. Smith in branch 'master':
    bpo-38417: Add umask support to subprocess (GH-16726)
    f3751ef

    @gpshead
    Copy link
    Member Author

    gpshead commented Oct 12, 2019

    Now to see if the more esoteric config buildbots find any platform issues to address...

    @gpshead gpshead closed this as completed Oct 12, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants