Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True) #70576

mikekap · 2016-02-18T22:37:01Z

BPO	26388
Nosy	@ncoghlan, @ericsnowcurrently, @wkschwartz
Files	patch.diff: Suggested change to runpy patch-2.diff: Larger refactoring

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2016-02-18.22:37:01.371>
labels = ['type-feature', 'library']
title = 'Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True)'
updated_at = <Date 2020-11-10.16:26:03.057>
user = 'https://bugs.python.org/mikekap'

bugs.python.org fields:

activity = <Date 2020-11-10.16:26:03.057>
actor = 'William.Schwartz'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2016-02-18.22:37:01.371>
creator = 'mikekap'
dependencies = []
files = ['41981', '45201']
hgrepos = []
issue_num = 26388
keywords = ['patch']
message_count = 11.0
messages = ['260483', '260492', '260661', '260662', '260701', '260761', '262975', '262977', '279288', '279345', '279709']
nosy_count = 5.0
nosy_names = ['ncoghlan', 'eric.snow', 'piotr.dobrogost', 'William.Schwartz', 'mikekap']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue26388'
versions = ['Python 3.6']

mikekap · 2016-02-18T22:37:01Z

For the purposes of pex (https://github.com/pantsbuild/pex), it would be useful to allow calling run_module without sys.argv[0] changing. In general, this behavior is useful if the script intends to re-exec itself (so it needs to know the original arguments that it was started with).

To make run_module more useful in general, I propose adding a argv parameter that has the following semantics:

(default) If set to a special value runpy.INHERIT (similar to subprocess.STDOUT), produces the current behavior of just changing sys.argv[0].
If set to None, does not touch sys.argv at all.
If set to an iterable, executes sys.argv = [module.__file__] + list(argv), and properly reverts it once the module is done executing.

ericsnowcurrently · 2016-02-19T00:36:10Z

python-dev thread:

https://mail.python.org/pipermail/python-dev/2016-February/143374.html

Notably:

pex-tool/pex#211

ncoghlan · 2016-02-22T08:29:24Z

This seems like a reasonable enhancement to me, but I'd appreciate your thoughts on how it might relate to a couple of other proposed ideas:

Supporting execution-by-module-name in pdb et al: https://bugs.python.org/issue9325
Supporting configurable target environments for module execution: https://bugs.python.org/issue19982

My own current assessment is "this RFE neither helps nor hinders those other ideas", so this can also be read as an attempt to get a potential contributor with an interest in runpy to take a look at those long-languishing ideas to see if they pique your interest ;)

ncoghlan · 2016-02-22T08:34:02Z

The PSF also has a Contributor Licensing Agreement in place for contributions to CPython, so if you could sign that, that would be great: https://www.python.org/psf/contrib/contrib-form/

Highlights from a contributor point of view:

the CLA explicitly licenses your contributions under an open source license (typically the Apache License 2.0)
the CLA grants the PSF authority to relicense your contributions under another open source license (this is currently the Python Software Foundation License)

mikekap · 2016-02-22T22:02:04Z

Looks like by signed CLA just made it through, so that should be settled.

For the other bugs, it seems that overloading run_module & run_path seems to be getting a bit cumbersome, so it might make sense to have some sort of Runner object that has things like Runner(module=..., code=..., path=...) and then has properties like argv = ..., module_dict = ... and maybe attributes for the loader and the filepath. "Advanced" usage can use that over the vanilla run_module & run_path.

Either way, I think that's outside the scope of this change. Unfortunately this is already shaving a yak for me (facebook/buck#651 (comment)) and I'd rather not go deeper.

mikekap · 2016-02-24T07:14:08Z

So how might I get this patch committed? :)

mikekap · 2016-04-07T01:54:03Z

ping

ncoghlan · 2016-04-07T03:32:24Z

Thanks for the ping.

The actual code changes look OK to me for the initially proposed design, which means the main missing piece would be documentation updates (both to the docstrings and to runpy module reference).

However, thinking about how those docs might be written, I'm starting to think the specific proposed design would be inherently confusing for folks that aren't already familiar with runpy's internals, and it might be better to use a callback API with a few predefined helper functions.

That is, suppose the new parameter was a callback accepting the following parameters:

"module" - the module about to be executed
"argv" - sys.argv prior to module execution

And then we have 3 predefined callbacks/callback factories in runpy:

def keep_argv(module, argv):
    return argv

def set_argv0(module, argv):
    return module.__file__ + argv[1:]

def set_argv(argv, *, implicit_argv0=True):
    argv_override = list(argv)
    if implicit_argv0:
        def _set_argv(module, original_argv):
            return [module.__file__] + argv_override
    else:
        def _set_argv(module, original_argv):
            return argv_override
    return _set_argv

Then the three scenarios in your original post would look like:

    runpy.run_module(mod_name, argv=set_argv0)
    runpy.run_module(mod_name, argv=keep_argv)
    runpy.run_module(mod_name, argv=set_argv(iterable))

(and similarly for run_path)

"argv=None" would be the default, and equivalent to specifying "argv=set_argv0"

"argv=set_argv(iterable, implicit_argv0=False)" would allow even more precise control of the argv settings seen by the running module.

Future and/or custom variations would be straightforward, since we're just passing in a callable.

The documentation benefit is that the "argv" parameter docs can just describe the callback signature and the use of "set_argv0" as the default, with the details of the individual behaviours moving to the definitions of the corresponding functions.

mikekap · 2016-10-23T22:42:51Z

Hey Nick,

Sorry for the long delay. Unfortunately Python isn't my main work language anymore so working on this has proved to be quite a context switch. I'm going to try to finish this up now.

The attached patch implements a new pattern for wrapping runpy - one that I hope is a bit more general than just setting argv. In particular, using the new load_module/load_path doesn't automatically change argv at all when calling run. The callers can do pretty much whatever they want before calling run().

From a docs perspective this is quite a bit easies to understand. You call runpy.load_* to find the module, change whatever you want: globals/module dict values/names/etc, and call .run(). We would even expose a convenient `with runpy.ModifiedArgv(argv):` to help with the argv swapping. "Simple" use-cases can continue to use runpy.run_* without needing to get into the save/restore we do here.

Let me know what you think of this approach and I can flesh out the docs around it. Otherwise, I'm more than happy to implement the callback approach you suggested.

Thanks,
Mike.

ncoghlan · 2016-10-25T02:24:53Z

Ah, very nice. (And no worries on taking an as-you-have-time approach to this - you'll see from the dates on some of the referenced issues below that even I'm in that situation where runpy is concerned)

I think you're right that offering a 2-phase load/run API will make a lot of sense to folks already familiar with the find/exec model for imports, and it also aligns with this old design concept for making runpy friendlier to modules that need access to the created globals even if the execution step fails: http://bugs.python.org/issue9325#msg133833

I'd just completely missed that that idea was potentially relevant here as well :)

I'll provide a few more detailed comments inline.

The scale of the refactoring does make me wonder if there might be a way to account for the "target module" idea in http://bugs.python.org/issue19982 though.

mikekap · 2016-10-30T05:52:34Z

Hey Nick,

Definitely agree that this refactor is big enough to try adding target modules. There's a somewhat hidden feature in the second patch that does this: use_globals_from_sys_modules takes sys.globals from the sys.modules entry for the module. It's a constructor arg though. It's only used by _run_module_as_main, but making it more official sounds good.

Given that goal, I'm a bit worried about how to accurately describe the behavior of runnable.globals. Particularly, what's a good API? Here are a couple of options I'm thinking of:

x = load_module(...); x.module = sys.modules['foo']; x.run().
Pros: allows setting the module that's to be used for maximal customizability.
Cons: x.globals is poorly defined in that scenario. What should it reflect AFTER calling x.run()? What should it be before/after setting x.module? Should it overwrite file, loader, etc in the target module? Should it restore the values? When should it do this?
x = load_module(...); x.overwrite_sys_modules = True; x.run()
This is a version of the above that perhaps makes it a bit easier to document what happens to which globals when.
x = load_module(..., target_module=sys.modules['foo']); x.run()
Pros: less ambiguity about globals. They're always either local or the module's.
Cons: all other "customizations" can be set after load_module is called. This is very asymmetric from an API perspective. There's also some ambiguity about what happens to the file, line, etc.
x = load_module(...); x.run(target_module=...);
This is pretty much the sum of the cons of the above and more. Mostly here for completeness.

I'm leaning towards the second option for API symmetry. The largest hurdle is defining a behavior w.r.t. globals that is least surprising. Maybe something like - if set to True, the globals in the target will be overwritten (i.e. .update) with the globals in the runner when run() is called. If folks want to save/restore anything around globals in the target module, they are free to do so themselves before calling .run().

Separately, what needs this type of behavior, other than for backwards compatibility? Do you know of any specific use-case? It feels like almost everything should be covered by a combination of add_to_sys_modules (i.e. temporary modules in sys.modules) and inspecting runner.globals after execution.

What do you think?

Mike.

mikekap mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Feb 18, 2016

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True) #70576

Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True) #70576

mikekap mannequin commented Feb 18, 2016

mikekap mannequin commented Feb 18, 2016

ericsnowcurrently commented Feb 19, 2016

ncoghlan commented Feb 22, 2016

ncoghlan commented Feb 22, 2016

mikekap mannequin commented Feb 22, 2016

mikekap mannequin commented Feb 24, 2016

mikekap mannequin commented Apr 7, 2016

ncoghlan commented Apr 7, 2016

mikekap mannequin commented Oct 23, 2016

ncoghlan commented Oct 25, 2016

mikekap mannequin commented Oct 30, 2016

Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True) #70576

Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True) #70576

Comments

mikekap mannequin commented Feb 18, 2016

mikekap mannequin commented Feb 18, 2016

ericsnowcurrently commented Feb 19, 2016

ncoghlan commented Feb 22, 2016

ncoghlan commented Feb 22, 2016

mikekap mannequin commented Feb 22, 2016

mikekap mannequin commented Feb 24, 2016

mikekap mannequin commented Apr 7, 2016

ncoghlan commented Apr 7, 2016

mikekap mannequin commented Oct 23, 2016

ncoghlan commented Oct 25, 2016

mikekap mannequin commented Oct 30, 2016