Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True) #70576

Open
mikekap mannequin opened this issue Feb 18, 2016 · 11 comments
Open

Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True) #70576

mikekap mannequin opened this issue Feb 18, 2016 · 11 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@mikekap
Copy link
Mannequin

mikekap mannequin commented Feb 18, 2016

BPO 26388
Nosy @ncoghlan, @ericsnowcurrently, @wkschwartz
Files
  • patch.diff: Suggested change to runpy
  • patch-2.diff: Larger refactoring
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2016-02-18.22:37:01.371>
    labels = ['type-feature', 'library']
    title = 'Disabling changing sys.argv[0] with runpy.run_module(...alter_sys=True)'
    updated_at = <Date 2020-11-10.16:26:03.057>
    user = 'https://bugs.python.org/mikekap'

    bugs.python.org fields:

    activity = <Date 2020-11-10.16:26:03.057>
    actor = 'William.Schwartz'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2016-02-18.22:37:01.371>
    creator = 'mikekap'
    dependencies = []
    files = ['41981', '45201']
    hgrepos = []
    issue_num = 26388
    keywords = ['patch']
    message_count = 11.0
    messages = ['260483', '260492', '260661', '260662', '260701', '260761', '262975', '262977', '279288', '279345', '279709']
    nosy_count = 5.0
    nosy_names = ['ncoghlan', 'eric.snow', 'piotr.dobrogost', 'William.Schwartz', 'mikekap']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue26388'
    versions = ['Python 3.6']

    @mikekap
    Copy link
    Mannequin Author

    mikekap mannequin commented Feb 18, 2016

    For the purposes of pex (https://github.com/pantsbuild/pex), it would be useful to allow calling run_module without sys.argv[0] changing. In general, this behavior is useful if the script intends to re-exec itself (so it needs to know the original arguments that it was started with).

    To make run_module more useful in general, I propose adding a argv parameter that has the following semantics:

    • (default) If set to a special value runpy.INHERIT (similar to subprocess.STDOUT), produces the current behavior of just changing sys.argv[0].
    • If set to None, does not touch sys.argv at all.
    • If set to an iterable, executes sys.argv = [module.__file__] + list(argv), and properly reverts it once the module is done executing.

    @mikekap mikekap mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Feb 18, 2016
    @ericsnowcurrently
    Copy link
    Member

    @ncoghlan
    Copy link
    Contributor

    This seems like a reasonable enhancement to me, but I'd appreciate your thoughts on how it might relate to a couple of other proposed ideas:

    My own current assessment is "this RFE neither helps nor hinders those other ideas", so this can also be read as an attempt to get a potential contributor with an interest in runpy to take a look at those long-languishing ideas to see if they pique your interest ;)

    @ncoghlan
    Copy link
    Contributor

    The PSF also has a Contributor Licensing Agreement in place for contributions to CPython, so if you could sign that, that would be great: https://www.python.org/psf/contrib/contrib-form/

    Highlights from a contributor point of view:

    • the CLA explicitly licenses your contributions under an open source license (typically the Apache License 2.0)
    • the CLA grants the PSF authority to relicense your contributions under another open source license (this is currently the Python Software Foundation License)

    @mikekap
    Copy link
    Mannequin Author

    mikekap mannequin commented Feb 22, 2016

    Looks like by signed CLA just made it through, so that should be settled.

    For the other bugs, it seems that overloading run_module & run_path seems to be getting a bit cumbersome, so it might make sense to have some sort of Runner object that has things like Runner(module=..., code=..., path=...) and then has properties like argv = ..., module_dict = ... and maybe attributes for the loader and the filepath. "Advanced" usage can use that over the vanilla run_module & run_path.

    Either way, I think that's outside the scope of this change. Unfortunately this is already shaving a yak for me (facebook/buck#651 (comment)) and I'd rather not go deeper.

    @mikekap
    Copy link
    Mannequin Author

    mikekap mannequin commented Feb 24, 2016

    So how might I get this patch committed? :)

    @mikekap
    Copy link
    Mannequin Author

    mikekap mannequin commented Apr 7, 2016

    ping

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Apr 7, 2016

    Thanks for the ping.

    The actual code changes look OK to me for the initially proposed design, which means the main missing piece would be documentation updates (both to the docstrings and to runpy module reference).

    However, thinking about how those docs might be written, I'm starting to think the specific proposed design would be inherently confusing for folks that aren't already familiar with runpy's internals, and it might be better to use a callback API with a few predefined helper functions.

    That is, suppose the new parameter was a callback accepting the following parameters:

    • "module" - the module about to be executed
    • "argv" - sys.argv prior to module execution

    And then we have 3 predefined callbacks/callback factories in runpy:

    def keep_argv(module, argv):
        return argv
    
    def set_argv0(module, argv):
        return module.__file__ + argv[1:]
    
    def set_argv(argv, *, implicit_argv0=True):
        argv_override = list(argv)
        if implicit_argv0:
            def _set_argv(module, original_argv):
                return [module.__file__] + argv_override
        else:
            def _set_argv(module, original_argv):
                return argv_override
        return _set_argv

    Then the three scenarios in your original post would look like:

        runpy.run_module(mod_name, argv=set_argv0)
        runpy.run_module(mod_name, argv=keep_argv)
        runpy.run_module(mod_name, argv=set_argv(iterable))

    (and similarly for run_path)

    "argv=None" would be the default, and equivalent to specifying "argv=set_argv0"

    "argv=set_argv(iterable, implicit_argv0=False)" would allow even more precise control of the argv settings seen by the running module.

    Future and/or custom variations would be straightforward, since we're just passing in a callable.

    The documentation benefit is that the "argv" parameter docs can just describe the callback signature and the use of "set_argv0" as the default, with the details of the individual behaviours moving to the definitions of the corresponding functions.

    @mikekap
    Copy link
    Mannequin Author

    mikekap mannequin commented Oct 23, 2016

    Hey Nick,

    Sorry for the long delay. Unfortunately Python isn't my main work language anymore so working on this has proved to be quite a context switch. I'm going to try to finish this up now.

    The attached patch implements a new pattern for wrapping runpy - one that I hope is a bit more general than just setting argv. In particular, using the new load_module/load_path doesn't automatically change argv at all when calling run. The callers can do pretty much whatever they want before calling run().

    From a docs perspective this is quite a bit easies to understand. You call runpy.load_* to find the module, change whatever you want: globals/module dict values/names/etc, and call .run(). We would even expose a convenient `with runpy.ModifiedArgv(argv):` to help with the argv swapping. "Simple" use-cases can continue to use runpy.run_* without needing to get into the save/restore we do here.

    Let me know what you think of this approach and I can flesh out the docs around it. Otherwise, I'm more than happy to implement the callback approach you suggested.

    Thanks,
    Mike.

    @ncoghlan
    Copy link
    Contributor

    Ah, very nice. (And no worries on taking an as-you-have-time approach to this - you'll see from the dates on some of the referenced issues below that even I'm in that situation where runpy is concerned)

    I think you're right that offering a 2-phase load/run API will make a lot of sense to folks already familiar with the find/exec model for imports, and it also aligns with this old design concept for making runpy friendlier to modules that need access to the created globals even if the execution step fails: http://bugs.python.org/issue9325#msg133833

    I'd just completely missed that that idea was potentially relevant here as well :)

    I'll provide a few more detailed comments inline.

    The scale of the refactoring does make me wonder if there might be a way to account for the "target module" idea in http://bugs.python.org/issue19982 though.

    @mikekap
    Copy link
    Mannequin Author

    mikekap mannequin commented Oct 30, 2016

    Hey Nick,

    Definitely agree that this refactor is big enough to try adding target modules. There's a somewhat hidden feature in the second patch that does this: use_globals_from_sys_modules takes sys.globals from the sys.modules entry for the module. It's a constructor arg though. It's only used by _run_module_as_main, but making it more official sounds good.

    Given that goal, I'm a bit worried about how to accurately describe the behavior of runnable.globals. Particularly, what's a good API? Here are a couple of options I'm thinking of:

    • x = load_module(...); x.module = sys.modules['foo']; x.run().
      Pros: allows setting the module that's to be used for maximal customizability.
      Cons: x.globals is poorly defined in that scenario. What should it reflect AFTER calling x.run()? What should it be before/after setting x.module? Should it overwrite file, loader, etc in the target module? Should it restore the values? When should it do this?
    • x = load_module(...); x.overwrite_sys_modules = True; x.run()
      This is a version of the above that perhaps makes it a bit easier to document what happens to which globals when.
    • x = load_module(..., target_module=sys.modules['foo']); x.run()
      Pros: less ambiguity about globals. They're always either local or the module's.
      Cons: all other "customizations" can be set after load_module is called. This is very asymmetric from an API perspective. There's also some ambiguity about what happens to the file, line, etc.
    • x = load_module(...); x.run(target_module=...);
      This is pretty much the sum of the cons of the above and more. Mostly here for completeness.

    I'm leaning towards the second option for API symmetry. The largest hurdle is defining a behavior w.r.t. globals that is least surprising. Maybe something like - if set to True, the globals in the target will be overwritten (i.e. .update) with the globals in the runner when run() is called. If folks want to save/restore anything around globals in the target module, they are free to do so themselves before calling .run().

    Separately, what needs this type of behavior, other than for backwards compatibility? Do you know of any specific use-case? It feels like almost everything should be covered by a combination of add_to_sys_modules (i.e. temporary modules in sys.modules) and inspecting runner.globals after execution.

    What do you think?

    Mike.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants