classification
Title: Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True)
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: William.Schwartz, eric.snow, mikekap, ncoghlan, piotr.dobrogost
Priority: normal Keywords: patch

Created on 2016-02-18 22:37 by mikekap, last changed 2020-11-10 16:26 by William.Schwartz.

Files
File name Uploaded Description Edit
patch.diff mikekap, 2016-02-21 02:25 Suggested change to runpy review
patch-2.diff mikekap, 2016-10-23 22:42 Larger refactoring review
Messages (11)
msg260483 - (view) Author: Mike Kaplinskiy (mikekap) * Date: 2016-02-18 22:37
For the purposes of pex (https://github.com/pantsbuild/pex), it would be useful to allow calling run_module without sys.argv[0] changing. In general, this behavior is useful if the script intends to re-exec itself (so it needs to know the original arguments that it was started with).

To make run_module more useful in general, I propose adding a `argv` parameter that has the following semantics:
 - (default) If set to a special value runpy.INHERIT (similar to subprocess.STDOUT), produces the current behavior of just changing sys.argv[0].
 - If set to None, does not touch sys.argv at all.
 - If set to an iterable, executes `sys.argv = [module.__file__] + list(argv)`, and properly reverts it once the module is done executing.
msg260492 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2016-02-19 00:36
python-dev thread:

  https://mail.python.org/pipermail/python-dev/2016-February/143374.html

Notably:

  https://github.com/pantsbuild/pex/pull/211
msg260661 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-02-22 08:29
This seems like a reasonable enhancement to me, but I'd appreciate your thoughts on how it might relate to a couple of other proposed ideas:

* Supporting execution-by-module-name in pdb et al: https://bugs.python.org/issue9325
* Supporting configurable target environments for module execution: https://bugs.python.org/issue19982

My own current assessment is "this RFE neither helps nor hinders those other ideas", so this can also be read as an attempt to get a potential contributor with an interest in runpy to take a look at those long-languishing ideas to see if they pique your interest ;)
msg260662 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-02-22 08:34
The PSF also has a Contributor Licensing Agreement in place for contributions to CPython, so if you could sign that, that would be great: https://www.python.org/psf/contrib/contrib-form/

Highlights from a contributor point of view:
- the CLA explicitly licenses your contributions under an open source license (typically the Apache License 2.0)
- the CLA grants the PSF authority to relicense your contributions under another open source license (this is currently the Python Software Foundation License)
msg260701 - (view) Author: Mike Kaplinskiy (mikekap) * Date: 2016-02-22 22:02
Looks like by signed CLA just made it through, so that should be settled.

For the other bugs, it seems that overloading run_module & run_path seems to be getting a bit cumbersome, so it might make sense to have some sort of Runner object that has things like `Runner(module=..., code=..., path=...)` and then has properties like `argv = ...`, `module_dict = ...` and maybe attributes for the loader and the filepath. "Advanced" usage can use that over the vanilla run_module & run_path.

Either way, I think that's outside the scope of this change. Unfortunately this is already shaving a yak for me (https://github.com/facebook/buck/pull/651#issuecomment-185030156) and I'd rather not go deeper.
msg260761 - (view) Author: Mike Kaplinskiy (mikekap) * Date: 2016-02-24 07:14
So how might I get this patch committed? :)
msg262975 - (view) Author: Mike Kaplinskiy (mikekap) * Date: 2016-04-07 01:54
ping
msg262977 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-04-07 03:32
Thanks for the ping.

The actual code changes look OK to me for the initially proposed design, which means the main missing piece would be documentation updates (both to the docstrings and to runpy module reference).

However, thinking about how those docs might be written, I'm starting to think the specific proposed design would be inherently confusing for folks that aren't already familiar with runpy's internals, and it might be better to use a callback API with a few predefined helper functions.

That is, suppose the new parameter was a callback accepting the following parameters:

* "module" - the module about to be executed
* "argv" - sys.argv prior to module execution

And then we have 3 predefined callbacks/callback factories in runpy:

def keep_argv(module, argv):
    return argv

def set_argv0(module, argv):
    return module.__file__ + argv[1:]

def set_argv(argv, *, implicit_argv0=True):
    argv_override = list(argv)
    if implicit_argv0:
        def _set_argv(module, original_argv):
            return [module.__file__] + argv_override
    else:
        def _set_argv(module, original_argv):
            return argv_override
    return _set_argv

Then the three scenarios in your original post would look like:

    runpy.run_module(mod_name, argv=set_argv0)
    runpy.run_module(mod_name, argv=keep_argv)
    runpy.run_module(mod_name, argv=set_argv(iterable))

(and similarly for run_path)

"argv=None" would be the default, and equivalent to specifying "argv=set_argv0"

"argv=set_argv(iterable, implicit_argv0=False)" would allow even more precise control of the argv settings seen by the running module.

Future and/or custom variations would be straightforward, since we're just passing in a callable.

The documentation benefit is that the "argv" parameter docs can just describe the callback signature and the use of "set_argv0" as the default, with the details of the individual behaviours moving to the definitions of the corresponding functions.
msg279288 - (view) Author: Mike Kaplinskiy (mikekap) * Date: 2016-10-23 22:42
Hey Nick,

Sorry for the long delay. Unfortunately Python isn't my main work language anymore so working on this has proved to be quite a context switch. I'm going to try to finish this up now.

The attached patch implements a new pattern for wrapping runpy - one that I hope is a bit more general than just setting argv. In particular, using the new load_module/load_path doesn't automatically change argv at all when calling run. The callers can do pretty much whatever they want before calling run().

From a docs perspective this is quite a bit easies to understand. You call runpy.load_* to find the module, change whatever you want: globals/module dict values/names/etc, and call .run(). We would even expose a convenient `with runpy.ModifiedArgv(argv):` to help with the argv swapping. "Simple" use-cases can continue to use runpy.run_* without needing to get into the save/restore we do here.

Let me know what you think of this approach and I can flesh out the docs around it. Otherwise, I'm more than happy to implement the callback approach you suggested.

Thanks,
Mike.
msg279345 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-10-25 02:24
Ah, very nice. (And no worries on taking an as-you-have-time approach to this - you'll see from the dates on some of the referenced issues below that even I'm in that situation where runpy is concerned)

I think you're right that offering a 2-phase load/run API will make a lot of sense to folks already familiar with the find/exec model for imports, and it also aligns with this old design concept for making runpy friendlier to modules that need access to the created globals even if the execution step fails: http://bugs.python.org/issue9325#msg133833

I'd just completely missed that that idea was potentially relevant here as well :)

I'll provide a few more detailed comments inline.

The scale of the refactoring does make me wonder if there might be a way to account for the "target module" idea in http://bugs.python.org/issue19982 though.
msg279709 - (view) Author: Mike Kaplinskiy (mikekap) * Date: 2016-10-30 05:52
Hey Nick,

Definitely agree that this refactor is big enough to try adding target modules. There's a somewhat hidden feature in the second patch that does this: `use_globals_from_sys_modules` takes `sys.globals` from the `sys.modules` entry for the module. It's a constructor arg though. It's only used by `_run_module_as_main`, but making it more official sounds good.

Given that goal, I'm a bit worried about how to accurately describe the behavior of `runnable.globals`. Particularly, what's a good API? Here are a couple of options I'm thinking of:

 - `x = load_module(...); x.module = sys.modules['foo']; x.run()`.
   Pros: allows setting the module that's to be used for maximal customizability.
   Cons: `x.globals` is poorly defined in that scenario. What should it reflect AFTER calling `x.run()`? What should it be before/after setting `x.module`? Should it overwrite __file__, __loader__, etc in the target module? Should it restore the values? When should it do this?
 - `x = load_module(...); x.overwrite_sys_modules = True; x.run()`
   This is a version of the above that perhaps makes it a bit easier to document what happens to which globals when.
 - `x = load_module(..., target_module=sys.modules['foo']); x.run()`
   Pros: less ambiguity about globals. They're always either local or the module's.
   Cons: all other "customizations" can be set after load_module is called. This is very asymmetric from an API perspective. There's also some ambiguity about what happens to the __file__, __line__, etc.
 - `x = load_module(...); x.run(target_module=...);`
   This is pretty much the sum of the cons of the above and more. Mostly here for completeness.

I'm leaning towards the second option for API symmetry. The largest hurdle is defining a behavior w.r.t. globals that is least surprising. Maybe something like - if set to True, the globals in the target will be overwritten (i.e. .update) with the globals in the runner when `run()` is called. If folks want to save/restore anything around globals in the target module, they are free to do so themselves before calling .run().

Separately, what needs this type of behavior, other than for backwards compatibility? Do you know of any specific use-case? It feels like almost everything should be covered by a combination of add_to_sys_modules (i.e. temporary modules in sys.modules) and inspecting runner.globals after execution.

What do you think?

Mike.
History
Date User Action Args
2020-11-10 16:26:03William.Schwartzsetnosy: + William.Schwartz
2020-11-04 21:38:44brett.cannonsetnosy: - brett.cannon
2017-02-08 23:30:20piotr.dobrogostsetnosy: + piotr.dobrogost
2016-10-30 05:52:35mikekapsetmessages: + msg279709
2016-10-25 02:24:54ncoghlansetnosy: + brett.cannon
messages: + msg279345
2016-10-23 22:42:53mikekapsetfiles: + patch-2.diff

messages: + msg279288
2016-04-07 03:32:24ncoghlansetmessages: + msg262977
2016-04-07 01:54:02mikekapsetmessages: + msg262975
2016-02-24 07:14:08mikekapsetmessages: + msg260761
2016-02-22 22:02:04mikekapsetmessages: + msg260701
2016-02-22 08:34:02ncoghlansetmessages: + msg260662
2016-02-22 08:29:24ncoghlansetmessages: + msg260661
2016-02-21 02:25:08mikekapsetfiles: + patch.diff
keywords: + patch
2016-02-19 00:36:10eric.snowsetnosy: + eric.snow
messages: + msg260492
2016-02-18 22:40:04brett.cannonsetnosy: + ncoghlan
2016-02-18 22:37:01mikekapcreate