classification
Title: Add support for settting umask in subprocess children
Type: enhancement Stage: commit review
Components: Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: christian.heimes, gregory.p.smith, vstinner
Priority: normal Keywords: patch

Created on 2019-10-08 22:55 by gregory.p.smith, last changed 2019-10-12 20:26 by gregory.p.smith. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 16726 merged gregory.p.smith, 2019-10-12 07:16
Messages (13)
msg354238 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-10-08 22:55
Another use of the deprecated unsafe preexec_fn was to call os.umask in the child prior to exec.

As seen in https://github.com/freeipa/freeipa/pull/3769 (see the code in there).

We should add an explicit feature for this to avoid people's desire for preexec_fn or for the heavyweight workaround of an intermediate shell calling umask before doing another exec.

Any common preexec_fn uses that we can encode into supported parameters will help our ability to remove the ill fated preexec_fn misfeature in the future.
msg354239 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-08 23:00
> We should add an explicit feature for this

If we need to write a wrapper program for that, I would say that no, we don't "have to" provide something in the stdlib.

In OpenStack, I wrote prlimit.py which is a preexec-like wrapper program to apply resource limits when calling a program. It's just a pure Python implementation of the Unix prlimit program. The Python implementation is used when the prlimit progrma is not available.

https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/prlimit.py

IMHO it's perfectly fine to explain that a wrapper program is needed to implement preexec-like features.
msg354240 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-10-08 23:06
We don't have to for all possible things, doing this just reduced friction for existing uses.  In this case, calling umask in our child ourselves would be easy to support.  (easier than the more important setuid/sid/gid/groups-ish stuff that recently went in)

I'm trying to make sure we track what is blocking people from getting rid of preexec_fn in their existing code so that we can actually deprecate and get rid of the API entirely.
msg354242 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-09 00:51
> I'm trying to make sure we track what is blocking people from getting rid of preexec_fn in their existing code so that we can actually deprecate and get rid of the API entirely.

If you consider posix_spawn(), I think that a convenient replacement for preexec_fn function would be a wrapper process which would execute *arbitrary Python code* before spawning the program.

It would not only cover umask case, but also prlimit, and another other custom code.

Pseudo-code of the wrapper:

  import sys
  code = sys.argv[1]
  argv = sys.argv[2:]
  eval(code)
  os.execv(argv[0], argv)

The main risk is that the arbitrary code could create an inheritable file descriptor (not all C extensions respect the PEP 446) which would survive after replacing the process memory with the new program.

Such design would allow to implement it in a third party package (on PyPI) for older Python versions as well.

--

Currently, preexec_fn is a direct reference to a callable Python object in the current process. preexec_fn calls it just after fork().

Here I'm talking about running arbitrary Python code in a freshly spawned Python process. It's different.

--

The new problems are:

* How to ensure that Python is configured as expected? Use -I command line option? Use -S to not import the site module? 
* How to report a Python exception from the child to the parent process? Build a pipe between the two processes and serialize the exception, as we are already doing for preexec_fn?
* How to report os.execv() failure to the parent? Just something like sys.exit(OSErrno.errno)?
* Should close_fds be implemented in the wrapper as well? If yes, can the parent avoid closing file descriptors?

> https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/prlimit.py

This wrapper uses os.execv().
msg354293 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-10-09 18:55
We should not provide such an "run arbitrary python code before execing the ultimate child" feature in the standard library.

It is complicated, and assumes you have an ability to execute a new interpreter with its own slow startup time as an intermediate in the child in the first place.  (embedded pythons do not have that ability)

Leave that to someone to implement on top of subprocess as a thing on PyPI.
msg354326 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-10 07:52
> Another use of the deprecated unsafe preexec_fn was to call os.umask in the child prior to exec.

What do you mean by "deprecated"? The parameter doesn't seem to be deprecated in the documentation:
https://docs.python.org/dev/library/subprocess.html#subprocess.Popen

And when I use it, it doesn't emit any warning:
---
import os, subprocess, sys
def func(): print(f"func called in {os.getpid()}")
argv = [sys.executable, "-c", "pass"]
print(f"parent: {os.getpid()}")
subprocess.run(argv, preexec_fn=func)
---

Output:
---
$ ./python -Werror x.py 
parent: 22264
func called in 22265
---

If you want to deprecate it, it should be documented as deprecated and emit a DeprecatedWarning, no?
msg354328 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-10 07:57
pylint emits a warning on subprocess.Popen(preexec_fn=func):

   W1509: Using preexec_fn keyword which may be unsafe in
   the presence of threads (subprocess-popen-preexec-fn)

But not when using subprocess.run(preexec_fn=func). Maybe a check is missing in pylint.

Note: pyflakes doesn't complain about preexec_fn.
msg354331 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-10 08:00
> We should not provide such an "run arbitrary python code before execing the ultimate child" feature in the standard library.

Do you want to modify _posixsubprocess to call umask() between fork() and exec()? As it has been done for uid, gid and groups: call setreuid(), setregid() and setgroups()?

If so, it means that posix_spawn() couldn't be used when the new umask parameter would be used, right?
msg354333 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2019-10-10 08:13
preexec_fn does not work in subinterpreters, which (amongst others) affects mod_wsgi and similar WSGI implementations. Therefore portable software must not use preexec_fn at all.
msg354334 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2019-10-10 08:14
Changed in version 3.8: The preexec_fn parameter is no longer supported in subinterpreters. The use of the parameter in a subinterpreter raises RuntimeError. The new restriction may affect applications that are deployed in mod_wsgi, uWSGI, and other embedded environments.
msg354398 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-10-10 18:25
preexec_fn has been mentally and advisability deprecated for years. :) 
I'll mark it officially with pending deprecation in 3.9 destined to be removed in 3.11.  tracking that in its own rollup issue https://bugs.python.org/issue38435

As far as posix_spawn goes, I expect these kinds of between fork+exec features to be something that prevents posix_spawn from being usable.  As are many other things.  People who want to use posix_spawn will need to know that and seek to avoid those.  That's a documentation issue first and foremost.  Our existing POpen API is very old and wasn't designed to make that nice.

A new API could be made that *only* supports posix_spawn available features if you want an entrypoint that encourages the generally lower latency posix_spawn path.  (A subprocess.spawn function similar to subprocess.run perhaps?)  That should be taken up within its own enhancement issue.
msg354552 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-10-12 20:24
New changeset f3751efb5c8b53b37efbbf75d9422c1d11c01646 by Gregory P. Smith in branch 'master':
bpo-38417: Add umask support to subprocess (GH-16726)
https://github.com/python/cpython/commit/f3751efb5c8b53b37efbbf75d9422c1d11c01646
msg354553 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-10-12 20:26
Now to see if the more esoteric config buildbots find any platform issues to address...
History
Date User Action Args
2019-10-12 20:26:18gregory.p.smithsetstatus: open -> closed
resolution: fixed
messages: + msg354553

stage: patch review -> commit review
2019-10-12 20:24:59gregory.p.smithsetmessages: + msg354552
2019-10-12 07:16:24gregory.p.smithsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request16306
2019-10-12 07:12:15gregory.p.smithsetassignee: gregory.p.smith
2019-10-10 18:25:51gregory.p.smithsetmessages: + msg354398
2019-10-10 08:14:42christian.heimessetmessages: + msg354334
2019-10-10 08:13:40christian.heimessetmessages: + msg354333
2019-10-10 08:00:58vstinnersetmessages: + msg354331
2019-10-10 07:57:08vstinnersetmessages: + msg354328
2019-10-10 07:52:58vstinnersetmessages: + msg354326
2019-10-10 07:39:23christian.heimessetnosy: + christian.heimes
2019-10-09 18:55:59gregory.p.smithsetmessages: + msg354293
2019-10-09 00:51:47vstinnersetmessages: + msg354242
2019-10-08 23:06:10gregory.p.smithsetpriority: high -> normal

messages: + msg354240
2019-10-08 23:00:46vstinnersetnosy: + vstinner
messages: + msg354239
2019-10-08 22:55:39gregory.p.smithcreate