classification
Title: Add feature to allow code execution prior to __main__ invocation
Type: enhancement Stage: needs patch
Components: Interpreter Core Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, asvetlov, barry, bfroehle, bkabrda, chris.jerdonek, eric.snow, gregory.p.smith, kristjan.jonsson, ncoghlan, nedbat, pitrou, serhiy.storchaka
Priority: normal Keywords:

Created on 2012-05-14 07:43 by ncoghlan, last changed 2013-03-19 00:25 by gregory.p.smith.

Messages (26)
msg160597 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-05-14 07:43
Reading http://nedbatchelder.com/code/coverage/subprocess.html, it occurred to me that there are various tracing and profiling operations that could be cleanly handled with significantly less work on the part of the tracing/profiling tool authors if the interpreter supported a "-C" operation that was like the existing "-c" option, but *didn't* terminate the options list.

The interpreter would invoke such commands after the interpreter is fully initialised, but before it begins the processing to find and execute __main__.

Then, to use subprocess coverage with coverage.py as an example, you could just run a command like:

"python -C 'import coverage; coverage.process_startup()' worker.py"

Other things you could usefully do in such an invocation is reconfigure sys.std(in|out|err) to match the settings used on the invoking side (e.g. to ensure Unicode data is tunnelled correctly), configure the logging module with a custom configuration, configure the warnings module programmatically, enable a memory profiler, etc.

Providing a function that could be called from -C and then uses an atexit() handler to do any necessary post-processing may be significantly simpler than trying to use runpy.run_(path|module) to achieve a similar effect.
msg160642 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-14 17:22
It would be nice to have a comparison of the available alternatives. It's not obvious that asking people to type some "-C ..." boilerplate to get code coverage is very user-friendly. Or am I misunderstanding the request?
msg160647 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-14 17:49
This can be achieved without intoducing a new interpreter option, using special module.

python -m prerun <code> <script> <args>...

(It may also be a runpy option).
msg160657 - (view) Author: Ned Batchelder (nedbat) * Date: 2012-05-14 19:02
The difficulty that coverage faces is not measuring python programs started from the command line like this, you can use "coverage run myprog.py" or "python -m coverage run myprog.py".

The difficulty is when there are subprocesses running python programs.  Read http://nedbatchelder.com/code/coverage/subprocess.html for the two current hacks used to invoke coverage on subprocesses.  If -C is implemented, it should have a PYTHONRUNFIRST environment variable (or the like) to make these hacks unnecessary.
msg160680 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-05-15 04:41
As Ned notes, to cover *implicit* creation of Python subprocesses an environment based solution would be needed to ensure the subprocesses adopt the desired settings. The advantage that has over the current workarounds is that it can be scoped to only affect the parent process when it is executed rather than affecting an entire Python installation.

As Serhiy notes, for *direct* invocation, you could use a custom module, but that wouldn't help with the subprocess case. In terms of implementation strategy, as with the -m switch, I'd probably bootstrap into runpy as soon as "-C" is encountered anyway to avoid the need for additional complexity in the C code.

One thing that would be useful with this is that it would eliminate some of the demand for -X options, since anything with a Python level API could just use -C instead. To use faulthandler as an example:

Current: "python -Xfaulthandler"
Alternate: "python -C 'import faulthandler; faulthandler.enable()'"

While the second is longer, the advantage is that it's the same as the way you enable faulthandler from Python, so there's no need to learn a special command (and, since its a -X option, the current way doesn't show up in the output of "python --help").

This would also cleanly resolve the request in issue 6958 to allow configuration of the logging module from the command line by allowing you to write things like:

   python -C "import logging; logging.basicConfig()" script.py

The interaction with other logging configuration is obvious (it follows the normal rules for multiple configuration attempts) and doesn't require the invention of new complex rules.

Warnings ends up in a similar boat. Simple options like -Wall, -Wdefault or -Werror are easy to use, but more complex configuration options are tricky. With "-C" you can do whatever you like using the Python API.

Other tricks also become possible without the need for launch scripts, like testing import logic and fallbacks by messing with sys.path and sys.modules, or patching builtins or other modules.
msg160700 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-15 09:36
> As Ned notes, to cover *implicit* creation of Python subprocesses an
> environment based solution would be needed to ensure the subprocesses
> adopt the desired settings.

So why aren't you proposing an environment-based solution instead? :)
To use the "-C" option, you have to modify all places which launch a
Python subprocess.
msg160702 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-05-15 09:46
Because I was thinking about a specific case where I *could* configure how the subprocesses were invoked (launching a test server for a web application). It took Ned's comment to remind me of the original use case (i.e. coverage statistics for a subprocesses created by an arbitrary application, *not* a custom test harness).

What this would allow is the elimination of a whole class of ad hoc feature requests - any process global configuration setting with a Python API would automatically also receive a command line API (via -C) and an environment API (via PYTHONRUNFIRST).

Some existing options (like -Xfaulthandler) may never have been added if -C was available.

That's why I changed the issue title (and am now updating the specific suggestion).
msg160703 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-05-15 09:51
Actually, there's another use case for you:

export PYTHONRUNFIRST="import faulthandler; faulthandler.enable()"
application.py

All subprocesses launched by the application will now have faulthandler enabled, *without* modifying the application. Doing this in a shell session means that faulthandler will be enabled for all Python processes you launch.

Obviously, care would need to be taken to ensure PYTHONRUNFIRST is ignored for setuid scripts (and it would respect -E as with any other environment variable).
msg160710 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-15 11:05
For faulthandler and coverage would be more convenient option "-M" (run
module with __name__='__premain__' (or something of the sort) and
continue command line processing).
msg160713 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-05-15 11:20
No, that increases complexity and coupling, because it would only work for modules that were designed to work that way. Execution of a simple statement will work for any global state that can be modified from pure Python code (including invocation of more complex configuration settings from a custom Python module).

For a mature application, you wouldn't do it this way because you'd have other more polished mechanisms in place, but for debugging, experimentation and dealing with recalcitrant third party software, it could help deal with various problems without having to edit the code.
msg166722 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-29 07:01
I've switched back to being -1 on the PYTHONRUNFIRST idea. There are no ACLs for environment variables, so the security implications scare me too much for me to support the feature.

The simple -C option doesn't have that problem, though, and could be used as infrastructure in a process infrastructure framework to provide enhanced configuration of Python subprocesses.
msg166747 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-29 11:03
> I've switched back to being -1 on the PYTHONRUNFIRST idea. There are
> no ACLs for environment variables, so the security implications scare
> me too much for me to support the feature.

I'm quite sure PYTHONHOME and PYTHONPATH already allow you to mess quite
freely. That's why we have the -E flag.

I'm -0.5 myself, though, for the reason that it complicates the startup
process a little bit more, without looking very compelling. It smells
disturbingly like LD_PRELOAD to me.

> The simple -C option doesn't have that problem, though, and could be
> used as infrastructure in a process infrastructure framework to
> provide enhanced configuration of Python subprocesses.

What do you mean exactly?
msg166752 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-29 12:37
Nothing too complicated - just noting that a test suite like ours that launches Python subprocesses to test process global state handling could fairly easily arrange to pass appropriate -C options to trigger things like recording coverage data or profiling options.

I'll also note that if you put a "preinit.py" on sys.path (e.g. in the current directory if using -m for invocation), you could easily do "-C 'import preinit'" to do arbitrarily complex custom setups, including preconfiguring your test framework.

A lot of my thoughts on this come out of looking into migrating the various stdlib modules like trace, pdb and profile over to supporting everything that runpy (and hence the main executable) supports, and a lot of the complexity lies in the mechanics of how to daisy chain the two "__main__" modules together. Running a bit of extra code in __main__ as supplied on the command line before kicking off the full import process helps avoid a lot of pain.
msg166755 - (view) Author: Ned Batchelder (nedbat) * Date: 2012-07-29 13:04
> I'm -0.5 myself, though, for the reason that it complicates the startup
> process a little bit more, without looking very compelling. It smells
> disturbingly like LD_PRELOAD to me.

Antoine, do you have a suggestion for how to solve the coverage.py problem?  To re-iterate: imagine you have a large test suite, and it spawns python processes during the tests.  Mercurial, for example, is like this.  You want to measure the coverage of your test suite.  This means not do you have to invoke the main suite with "python coverage.py run tests.py" instead of "python tests.py", but all the subprocess invocations need to invoke coverage.py as well.

We are looking for ways to make this as transparent as possible to the tests themselves, just as coverage measurement is now for test suites that don't spawn python subprocesses.

http://nedbatchelder.com/code/coverage/subprocess.html describes the two current hacks people can use to invoke coverage on subprocesses.  I was hoping for a cleaner more natural solution.
msg166762 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-29 14:18
> > I'm -0.5 myself, though, for the reason that it complicates the startup
> > process a little bit more, without looking very compelling. It smells
> > disturbingly like LD_PRELOAD to me.
> 
> Antoine, do you have a suggestion for how to solve the coverage.py
> problem?  To re-iterate: imagine you have a large test suite, and it
> spawns python processes during the tests.  Mercurial, for example, is
> like this.  You want to measure the coverage of your test suite.  This
> means not do you have to invoke the main suite with "python
> coverage.py run tests.py" instead of "python tests.py", but all the
> subprocess invocations need to invoke coverage.py as well.

Ok, sorry then, I retract what I said. I agree the use case is
legitimate.
msg166763 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-29 14:19
Ned, two questions: in the scenario you just described, is it a requirement that your test suite's code need not be modified (even minimally) to support coverage of subprocesses?

And can the solution assume that the test suite is spawning the Python processes in certain standard ways, or must it address all possible ways (even convoluted ones)?  If the former, what are the standard ways?  I am referring to things like the path to the Python interpreter invoked and how that is obtained, whether the subprocess module is always used, etc.

These questions are meant to help pin down the scope of what needs to be satisfied.
msg166765 - (view) Author: Ned Batchelder (nedbat) * Date: 2012-07-29 14:38
Chris:

The real problem is that it isn't the "test suite" that spawns the processes, the tests invoke product code, and the product code spawns Python.  So modifying the Python-spawning really means modifying the product code to do something different under coverage, which most developers (rightfully) won't want to do.

My preference is not to assume a particular style of spawning subprocesses, since coverage.py tried quite hard to be applicable to any Python project.
msg166772 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-29 14:54
I understand that.  Sorry, I meant to say "code under test."

If you make no assumptions about spawning subprocesses, does this mean, for example, that the solution must satisfy the case of subprocesses invoking a different version of Python, or invoking the same version of Python in a different virtual environment?
msg166779 - (view) Author: Ned Batchelder (nedbat) * Date: 2012-07-29 15:56
Chris, I'm not sure how to answer your questions.  The more powerful and flexible, the better.  There is no "must" here.  I'm looking for a way to avoid the hacks coverage.py has used in the past to measure coverage in subprocesses.  A language feature that allowed me to externally configure the interpreter to run some of my code first would allow me to do that.
msg166785 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-29 16:41
Okay, then in the interest of understanding why various alternatives fail, I'll just throw out the suggestion or question that I had in mind because I don't see it mentioned above or on the web page.

Why wouldn't it work to define an alias or script that invokes the desired "python -m ...", and then when you call your test suite (using a potentially different "-m ..."), you set sys.executable to that script so that subprocesses in your code under test will invoke it?  (coverage.py could do all of this under the hood so that the user need not be aware of it.)  Or would this work just fine, but that it's an example of the kind of hack that you're trying to avoid?
msg166841 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-30 00:55
There are two different use cases here. "-C" tackles one of them, "PYTHONRUNFIRST" the other.

My original use case came from working on the Python test suite. In that suite, we have "test.script_helper" which spawns Python subprocesses in order to test various aspects of the startup machinery. I can easily modify script_helper to pass an extra -C argument when gathering coverage data, so I don't need any implicit magic.

The -C option also simplifies a whole host of things by letting you use the Python API to perform preconfiguration of various subsystems before executing __main__ normally rather than having to either write a custom launch script (difficult to do with full generality) or adding even more arcane command line options.

However, the -C option doesn't cover the case of *implicit* invocation of subprocesses. This is where the PYTHONRUNFIRST suggestion comes in - the idea would that, unless -E is specified, then -C $PYTHONRUNFIRST would be implied.

To be honest, I *don't* think this latter capability should be built into the core implementation. Instead, I think it is more appropriate for it to be handled at a virtual environment level, so that it doesn't inadvertently affect invocation of other applications (like hg) that merely happen to be written in Python. Scoping it to a venv would also lessen many of my security concerns with the idea. A simple way to do this would be if pyvenv.cfg could contain a customisation snippet that was executed prior to __main__ invocation (building off the -C machinery)
msg166843 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-30 01:06
Le lundi 30 juillet 2012 à 00:55 +0000, Nick Coghlan a écrit :
> However, the -C option doesn't cover the case of *implicit* invocation
> of subprocesses. This is where the PYTHONRUNFIRST suggestion comes in
> - the idea would that, unless -E is specified, then -C $PYTHONRUNFIRST
> would be implied.
> 
> To be honest, I *don't* think this latter capability should be built
> into the core implementation. Instead, I think it is more appropriate
> for it to be handled at a virtual environment level, so that it
> doesn't inadvertently affect invocation of other applications (like
> hg) that merely happen to be written in Python.

Well, it shouldn't if you don't start doing "export PYTHONRUNFIRST=...",
but instead set it from the calling Python process (possibly from
coverage itself).

Having to create virtual environments and whatnot just to enjoy this
feature sounds terribly tedious.
msg167005 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-31 14:55
> However, the -C option doesn't cover the case of *implicit* invocation of subprocesses. This is where the PYTHONRUNFIRST suggestion comes in

Would a more general solution than PYTHONRUNFIRST be something like a suitably named PYTHONRUNINSTEAD? This would be an arbitrary script to run in place of python any time python was invoked. Alternatively (and less powerfully), it could be a default set of command options to pass to the Python executable.  Both of these seem more general than PYTHONRUNFIRST because the 'runas' command could itself be `python -C $PYTHONRUNFIRST ....`

> unless -E is specified, then -C $PYTHONRUNFIRST would be implied. To be honest, I *don't* think this latter capability should be built into the core implementation.... so that it doesn't inadvertently affect invocation of other applications (like hg)

It seems what you're saying is that you'd want PYTHONRUNFIRST to run only in special situations, rather than as the default.  Is there a sense then in which a functionality inverse to -E could be provided?  The idea would be that, when running Python, you could somehow instruct that an option like PYTHONRUN* would take effect only for the subprocesses spawned by the main process you're invoking (kind of like a context manager for the invocation of Python itself)?

The advantage of this approach would be that a special PYTHONRUNFIRST setting wouldn't take effect unless you explicitly say so.
msg167066 - (view) Author: Ned Batchelder (nedbat) * Date: 2012-08-01 01:07
I agree with Antoine: I don't see why this should be a feature of virtualenvs.  It's easy to use environment variables in a tightly-controlled way.  We don't worry that any of the other environment variables that affect Python execution will somehow escape into the wild and change how Mercurial (or anything else) works.
msg176889 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-12-04 09:59
offtopic:  
Noticed something pretty annoying:
If a package uses relative imports, e.g.
from . import sibling_module,
then it is impossible to run that package as a script, even with the __main__ trick.
msg176915 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-12-04 14:20
Why post that complaint here? If there's a case where__main__.__package__
isn't being set correctly by -m, file a separate bug.
History
Date User Action Args
2013-03-19 00:25:28gregory.p.smithsetnosy: + gregory.p.smith
2012-12-04 14:20:31ncoghlansetmessages: + msg176915
2012-12-04 09:59:24kristjan.jonssonsetmessages: + msg176889
2012-11-29 17:25:16barrysetnosy: + barry
2012-11-23 09:38:40kristjan.jonssonsetnosy: + kristjan.jonsson
2012-11-13 03:53:10eric.snowsetnosy: + eric.snow
2012-11-01 04:18:41bfroehlesetnosy: + bfroehle
2012-08-29 19:52:22asvetlovsetnosy: + asvetlov
2012-08-18 13:31:14Arfreversetnosy: + Arfrever
2012-08-01 01:07:52nedbatsetmessages: + msg167066
2012-07-31 14:55:57chris.jerdoneksetmessages: + msg167005
2012-07-30 01:06:19pitrousetmessages: + msg166843
2012-07-30 00:55:39ncoghlansetmessages: + msg166841
2012-07-29 16:41:24chris.jerdoneksetmessages: + msg166785
2012-07-29 15:56:46nedbatsetmessages: + msg166779
2012-07-29 14:54:46chris.jerdoneksetmessages: + msg166772
2012-07-29 14:38:10nedbatsetmessages: + msg166765
2012-07-29 14:19:14chris.jerdoneksetmessages: + msg166763
2012-07-29 14:18:58pitrousetmessages: + msg166762
2012-07-29 13:46:18chris.jerdoneksetnosy: + chris.jerdonek
2012-07-29 13:04:41nedbatsetmessages: + msg166755
2012-07-29 12:37:21ncoghlansetmessages: + msg166752
2012-07-29 11:03:54pitrousetmessages: + msg166747
2012-07-29 07:01:18ncoghlansetmessages: + msg166722
2012-06-20 12:33:23bkabrdasetnosy: + bkabrda
2012-06-12 12:00:33ncoghlansetversions: + Python 3.4, - Python 3.3
2012-05-15 11:20:32ncoghlansetmessages: + msg160713
2012-05-15 11:05:01serhiy.storchakasetmessages: + msg160710
2012-05-15 09:51:07ncoghlansetmessages: + msg160703
2012-05-15 09:46:15ncoghlansetmessages: + msg160702
title: Enhanced command line features for the runpy module -> Add feature to allow code execution prior to __main__ invocation
2012-05-15 09:36:43pitrousetmessages: + msg160700
title: Add feature to allow code execution prior to __main__ invocation -> Enhanced command line features for the runpy module
2012-05-15 04:42:10ncoghlansettitle: Enhanced command line features for the runpy module -> Add feature to allow code execution prior to __main__ invocation
2012-05-15 04:41:00ncoghlansetmessages: + msg160680
title: Add -C option to run code at Python startup -> Enhanced command line features for the runpy module
2012-05-14 19:02:52nedbatsetmessages: + msg160657
2012-05-14 17:49:27serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg160647
2012-05-14 17:22:42pitrousetnosy: + pitrou
messages: + msg160642
2012-05-14 10:57:59nedbatsetnosy: + nedbat
2012-05-14 07:45:40ncoghlansetcomponents: + Interpreter Core
2012-05-14 07:45:19ncoghlansetstage: needs patch
type: enhancement
versions: + Python 3.3
2012-05-14 07:43:38ncoghlancreate