Issue 14803: Add feature to allow code execution prior to __main__ invocation

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/59008

classification

Title:	Add feature to allow code execution prior to __main__ invocation
Type:	enhancement	Stage:	needs patch
Components:	Interpreter Core	Versions:	Python 3.4

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Arfrever, asvetlov, barry, bfroehle, bkabrda, chris.jerdonek, eric.snow, gregory.p.smith, ionelmc, kristjan.jonsson, ncoghlan, nedbat, pitrou, serhiy.storchaka
Priority:	normal	Keywords:

Created on 2012-05-14 07:43 by ncoghlan, last changed 2022-04-11 14:57 by admin.

Messages (27)
msg160597 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-05-14 07:43
Reading http://nedbatchelder.com/code/coverage/subprocess.html, it occurred to me that there are various tracing and profiling operations that could be cleanly handled with significantly less work on the part of the tracing/profiling tool authors if the interpreter supported a "-C" operation that was like the existing "-c" option, but didn't terminate the options list. The interpreter would invoke such commands after the interpreter is fully initialised, but before it begins the processing to find and execute __main__. Then, to use subprocess coverage with coverage.py as an example, you could just run a command like: "python -C 'import coverage; coverage.process_startup()' worker.py" Other things you could usefully do in such an invocation is reconfigure sys.std(in\|out\|err) to match the settings used on the invoking side (e.g. to ensure Unicode data is tunnelled correctly), configure the logging module with a custom configuration, configure the warnings module programmatically, enable a memory profiler, etc. Providing a function that could be called from -C and then uses an atexit() handler to do any necessary post-processing may be significantly simpler than trying to use runpy.run_(path\|module) to achieve a similar effect.
msg160642 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-05-14 17:22
It would be nice to have a comparison of the available alternatives. It's not obvious that asking people to type some "-C ..." boilerplate to get code coverage is very user-friendly. Or am I misunderstanding the request?
msg160647 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2012-05-14 17:49
This can be achieved without intoducing a new interpreter option, using special module. python -m prerun <code> <script> <args>... (It may also be a runpy option).
msg160657 - (view)	Author: Ned Batchelder (nedbat) *	Date: 2012-05-14 19:02
The difficulty that coverage faces is not measuring python programs started from the command line like this, you can use "coverage run myprog.py" or "python -m coverage run myprog.py". The difficulty is when there are subprocesses running python programs. Read http://nedbatchelder.com/code/coverage/subprocess.html for the two current hacks used to invoke coverage on subprocesses. If -C is implemented, it should have a PYTHONRUNFIRST environment variable (or the like) to make these hacks unnecessary.
msg160680 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-05-15 04:41
As Ned notes, to cover implicit creation of Python subprocesses an environment based solution would be needed to ensure the subprocesses adopt the desired settings. The advantage that has over the current workarounds is that it can be scoped to only affect the parent process when it is executed rather than affecting an entire Python installation. As Serhiy notes, for direct invocation, you could use a custom module, but that wouldn't help with the subprocess case. In terms of implementation strategy, as with the -m switch, I'd probably bootstrap into runpy as soon as "-C" is encountered anyway to avoid the need for additional complexity in the C code. One thing that would be useful with this is that it would eliminate some of the demand for -X options, since anything with a Python level API could just use -C instead. To use faulthandler as an example: Current: "python -Xfaulthandler" Alternate: "python -C 'import faulthandler; faulthandler.enable()'" While the second is longer, the advantage is that it's the same as the way you enable faulthandler from Python, so there's no need to learn a special command (and, since its a -X option, the current way doesn't show up in the output of "python --help"). This would also cleanly resolve the request in issue 6958 to allow configuration of the logging module from the command line by allowing you to write things like: python -C "import logging; logging.basicConfig()" script.py The interaction with other logging configuration is obvious (it follows the normal rules for multiple configuration attempts) and doesn't require the invention of new complex rules. Warnings ends up in a similar boat. Simple options like -Wall, -Wdefault or -Werror are easy to use, but more complex configuration options are tricky. With "-C" you can do whatever you like using the Python API. Other tricks also become possible without the need for launch scripts, like testing import logic and fallbacks by messing with sys.path and sys.modules, or patching builtins or other modules.
msg160700 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-05-15 09:36
> As Ned notes, to cover implicit creation of Python subprocesses an > environment based solution would be needed to ensure the subprocesses > adopt the desired settings. So why aren't you proposing an environment-based solution instead? :) To use the "-C" option, you have to modify all places which launch a Python subprocess.
msg160702 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-05-15 09:46
Because I was thinking about a specific case where I could configure how the subprocesses were invoked (launching a test server for a web application). It took Ned's comment to remind me of the original use case (i.e. coverage statistics for a subprocesses created by an arbitrary application, not a custom test harness). What this would allow is the elimination of a whole class of ad hoc feature requests - any process global configuration setting with a Python API would automatically also receive a command line API (via -C) and an environment API (via PYTHONRUNFIRST). Some existing options (like -Xfaulthandler) may never have been added if -C was available. That's why I changed the issue title (and am now updating the specific suggestion).
msg160703 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-05-15 09:51
Actually, there's another use case for you: export PYTHONRUNFIRST="import faulthandler; faulthandler.enable()" application.py All subprocesses launched by the application will now have faulthandler enabled, without modifying the application. Doing this in a shell session means that faulthandler will be enabled for all Python processes you launch. Obviously, care would need to be taken to ensure PYTHONRUNFIRST is ignored for setuid scripts (and it would respect -E as with any other environment variable).
msg160710 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2012-05-15 11:05
For faulthandler and coverage would be more convenient option "-M" (run module with __name__='__premain__' (or something of the sort) and continue command line processing).
msg160713 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-05-15 11:20
No, that increases complexity and coupling, because it would only work for modules that were designed to work that way. Execution of a simple statement will work for any global state that can be modified from pure Python code (including invocation of more complex configuration settings from a custom Python module). For a mature application, you wouldn't do it this way because you'd have other more polished mechanisms in place, but for debugging, experimentation and dealing with recalcitrant third party software, it could help deal with various problems without having to edit the code.
msg166722 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-07-29 07:01
I've switched back to being -1 on the PYTHONRUNFIRST idea. There are no ACLs for environment variables, so the security implications scare me too much for me to support the feature. The simple -C option doesn't have that problem, though, and could be used as infrastructure in a process infrastructure framework to provide enhanced configuration of Python subprocesses.
msg166747 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-07-29 11:03
> I've switched back to being -1 on the PYTHONRUNFIRST idea. There are > no ACLs for environment variables, so the security implications scare > me too much for me to support the feature. I'm quite sure PYTHONHOME and PYTHONPATH already allow you to mess quite freely. That's why we have the -E flag. I'm -0.5 myself, though, for the reason that it complicates the startup process a little bit more, without looking very compelling. It smells disturbingly like LD_PRELOAD to me. > The simple -C option doesn't have that problem, though, and could be > used as infrastructure in a process infrastructure framework to > provide enhanced configuration of Python subprocesses. What do you mean exactly?
msg166752 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-07-29 12:37
Nothing too complicated - just noting that a test suite like ours that launches Python subprocesses to test process global state handling could fairly easily arrange to pass appropriate -C options to trigger things like recording coverage data or profiling options. I'll also note that if you put a "preinit.py" on sys.path (e.g. in the current directory if using -m for invocation), you could easily do "-C 'import preinit'" to do arbitrarily complex custom setups, including preconfiguring your test framework. A lot of my thoughts on this come out of looking into migrating the various stdlib modules like trace, pdb and profile over to supporting everything that runpy (and hence the main executable) supports, and a lot of the complexity lies in the mechanics of how to daisy chain the two "__main__" modules together. Running a bit of extra code in __main__ as supplied on the command line before kicking off the full import process helps avoid a lot of pain.
msg166755 - (view)	Author: Ned Batchelder (nedbat) *	Date: 2012-07-29 13:04
> I'm -0.5 myself, though, for the reason that it complicates the startup > process a little bit more, without looking very compelling. It smells > disturbingly like LD_PRELOAD to me. Antoine, do you have a suggestion for how to solve the coverage.py problem? To re-iterate: imagine you have a large test suite, and it spawns python processes during the tests. Mercurial, for example, is like this. You want to measure the coverage of your test suite. This means not do you have to invoke the main suite with "python coverage.py run tests.py" instead of "python tests.py", but all the subprocess invocations need to invoke coverage.py as well. We are looking for ways to make this as transparent as possible to the tests themselves, just as coverage measurement is now for test suites that don't spawn python subprocesses. http://nedbatchelder.com/code/coverage/subprocess.html describes the two current hacks people can use to invoke coverage on subprocesses. I was hoping for a cleaner more natural solution.
msg166762 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-07-29 14:18
> > I'm -0.5 myself, though, for the reason that it complicates the startup > > process a little bit more, without looking very compelling. It smells > > disturbingly like LD_PRELOAD to me. > > Antoine, do you have a suggestion for how to solve the coverage.py > problem? To re-iterate: imagine you have a large test suite, and it > spawns python processes during the tests. Mercurial, for example, is > like this. You want to measure the coverage of your test suite. This > means not do you have to invoke the main suite with "python > coverage.py run tests.py" instead of "python tests.py", but all the > subprocess invocations need to invoke coverage.py as well. Ok, sorry then, I retract what I said. I agree the use case is legitimate.
msg166763 - (view)	Author: Chris Jerdonek (chris.jerdonek) *	Date: 2012-07-29 14:19
Ned, two questions: in the scenario you just described, is it a requirement that your test suite's code need not be modified (even minimally) to support coverage of subprocesses? And can the solution assume that the test suite is spawning the Python processes in certain standard ways, or must it address all possible ways (even convoluted ones)? If the former, what are the standard ways? I am referring to things like the path to the Python interpreter invoked and how that is obtained, whether the subprocess module is always used, etc. These questions are meant to help pin down the scope of what needs to be satisfied.
msg166765 - (view)	Author: Ned Batchelder (nedbat) *	Date: 2012-07-29 14:38
Chris: The real problem is that it isn't the "test suite" that spawns the processes, the tests invoke product code, and the product code spawns Python. So modifying the Python-spawning really means modifying the product code to do something different under coverage, which most developers (rightfully) won't want to do. My preference is not to assume a particular style of spawning subprocesses, since coverage.py tried quite hard to be applicable to any Python project.
msg166772 - (view)	Author: Chris Jerdonek (chris.jerdonek) *	Date: 2012-07-29 14:54
I understand that. Sorry, I meant to say "code under test." If you make no assumptions about spawning subprocesses, does this mean, for example, that the solution must satisfy the case of subprocesses invoking a different version of Python, or invoking the same version of Python in a different virtual environment?
msg166779 - (view)	Author: Ned Batchelder (nedbat) *	Date: 2012-07-29 15:56
Chris, I'm not sure how to answer your questions. The more powerful and flexible, the better. There is no "must" here. I'm looking for a way to avoid the hacks coverage.py has used in the past to measure coverage in subprocesses. A language feature that allowed me to externally configure the interpreter to run some of my code first would allow me to do that.
msg166785 - (view)	Author: Chris Jerdonek (chris.jerdonek) *	Date: 2012-07-29 16:41
Okay, then in the interest of understanding why various alternatives fail, I'll just throw out the suggestion or question that I had in mind because I don't see it mentioned above or on the web page. Why wouldn't it work to define an alias or script that invokes the desired "python -m ...", and then when you call your test suite (using a potentially different "-m ..."), you set sys.executable to that script so that subprocesses in your code under test will invoke it? (coverage.py could do all of this under the hood so that the user need not be aware of it.) Or would this work just fine, but that it's an example of the kind of hack that you're trying to avoid?
msg166841 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-07-30 00:55
There are two different use cases here. "-C" tackles one of them, "PYTHONRUNFIRST" the other. My original use case came from working on the Python test suite. In that suite, we have "test.script_helper" which spawns Python subprocesses in order to test various aspects of the startup machinery. I can easily modify script_helper to pass an extra -C argument when gathering coverage data, so I don't need any implicit magic. The -C option also simplifies a whole host of things by letting you use the Python API to perform preconfiguration of various subsystems before executing __main__ normally rather than having to either write a custom launch script (difficult to do with full generality) or adding even more arcane command line options. However, the -C option doesn't cover the case of implicit invocation of subprocesses. This is where the PYTHONRUNFIRST suggestion comes in - the idea would that, unless -E is specified, then -C $PYTHONRUNFIRST would be implied. To be honest, I don't think this latter capability should be built into the core implementation. Instead, I think it is more appropriate for it to be handled at a virtual environment level, so that it doesn't inadvertently affect invocation of other applications (like hg) that merely happen to be written in Python. Scoping it to a venv would also lessen many of my security concerns with the idea. A simple way to do this would be if pyvenv.cfg could contain a customisation snippet that was executed prior to __main__ invocation (building off the -C machinery)
msg166843 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-07-30 01:06
Le lundi 30 juillet 2012 à 00:55 +0000, Nick Coghlan a écrit : > However, the -C option doesn't cover the case of implicit invocation > of subprocesses. This is where the PYTHONRUNFIRST suggestion comes in > - the idea would that, unless -E is specified, then -C $PYTHONRUNFIRST > would be implied. > > To be honest, I don't think this latter capability should be built > into the core implementation. Instead, I think it is more appropriate > for it to be handled at a virtual environment level, so that it > doesn't inadvertently affect invocation of other applications (like > hg) that merely happen to be written in Python. Well, it shouldn't if you don't start doing "export PYTHONRUNFIRST=...", but instead set it from the calling Python process (possibly from coverage itself). Having to create virtual environments and whatnot just to enjoy this feature sounds terribly tedious.
msg167005 - (view)	Author: Chris Jerdonek (chris.jerdonek) *	Date: 2012-07-31 14:55
> However, the -C option doesn't cover the case of implicit invocation of subprocesses. This is where the PYTHONRUNFIRST suggestion comes in Would a more general solution than PYTHONRUNFIRST be something like a suitably named PYTHONRUNINSTEAD? This would be an arbitrary script to run in place of python any time python was invoked. Alternatively (and less powerfully), it could be a default set of command options to pass to the Python executable. Both of these seem more general than PYTHONRUNFIRST because the 'runas' command could itself be `python -C $PYTHONRUNFIRST ....` > unless -E is specified, then -C $PYTHONRUNFIRST would be implied. To be honest, I don't think this latter capability should be built into the core implementation.... so that it doesn't inadvertently affect invocation of other applications (like hg) It seems what you're saying is that you'd want PYTHONRUNFIRST to run only in special situations, rather than as the default. Is there a sense then in which a functionality inverse to -E could be provided? The idea would be that, when running Python, you could somehow instruct that an option like PYTHONRUN* would take effect only for the subprocesses spawned by the main process you're invoking (kind of like a context manager for the invocation of Python itself)? The advantage of this approach would be that a special PYTHONRUNFIRST setting wouldn't take effect unless you explicitly say so.
msg167066 - (view)	Author: Ned Batchelder (nedbat) *	Date: 2012-08-01 01:07
I agree with Antoine: I don't see why this should be a feature of virtualenvs. It's easy to use environment variables in a tightly-controlled way. We don't worry that any of the other environment variables that affect Python execution will somehow escape into the wild and change how Mercurial (or anything else) works.
msg176889 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) *	Date: 2012-12-04 09:59
offtopic: Noticed something pretty annoying: If a package uses relative imports, e.g. from . import sibling_module, then it is impossible to run that package as a script, even with the __main__ trick.
msg176915 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-12-04 14:20
Why post that complaint here? If there's a case where__main__.__package__ isn't being set correctly by -m, file a separate bug.
msg368728 - (view)	Author: Ionel Cristian Mărieș (ionelmc)	Date: 2020-05-12 13:06
Note that coveragepy ain't the sole usecase for this. https://pypi.org/project/manhole/ - a debugging tool https://pypi.org/project/hunter/ - a tracer In addition to those there's https://pypi.org/project/pytest-cov/ which packages the pth trick so coverage works consistently in all scenarios without putting users through the trouble of messing their python installation. Also, the amount of activity and enthusiasm on changing something that already works while other inconsistencies in python's core like issue23990 are ignored is disheartening.

History
Date	User	Action	Args
2022-04-11 14:57:30	admin	set	github: 59008
2020-05-12 13:06:24	ionelmc	set	nosy: + ionelmc messages: + msg368728
2019-03-14 14:36:07	ncoghlan	link	issue33944 dependencies
2013-03-19 00:25:28	gregory.p.smith	set	nosy: + gregory.p.smith
2012-12-04 14:20:31	ncoghlan	set	messages: + msg176915
2012-12-04 09:59:24	kristjan.jonsson	set	messages: + msg176889
2012-11-29 17:25:16	barry	set	nosy: + barry
2012-11-23 09:38:40	kristjan.jonsson	set	nosy: + kristjan.jonsson
2012-11-13 03:53:10	eric.snow	set	nosy: + eric.snow
2012-11-01 04:18:41	bfroehle	set	nosy: + bfroehle
2012-08-29 19:52:22	asvetlov	set	nosy: + asvetlov
2012-08-18 13:31:14	Arfrever	set	nosy: + Arfrever
2012-08-01 01:07:52	nedbat	set	messages: + msg167066
2012-07-31 14:55:57	chris.jerdonek	set	messages: + msg167005
2012-07-30 01:06:19	pitrou	set	messages: + msg166843
2012-07-30 00:55:39	ncoghlan	set	messages: + msg166841
2012-07-29 16:41:24	chris.jerdonek	set	messages: + msg166785
2012-07-29 15:56:46	nedbat	set	messages: + msg166779
2012-07-29 14:54:46	chris.jerdonek	set	messages: + msg166772
2012-07-29 14:38:10	nedbat	set	messages: + msg166765
2012-07-29 14:19:14	chris.jerdonek	set	messages: + msg166763
2012-07-29 14:18:58	pitrou	set	messages: + msg166762
2012-07-29 13:46:18	chris.jerdonek	set	nosy: + chris.jerdonek
2012-07-29 13:04:41	nedbat	set	messages: + msg166755
2012-07-29 12:37:21	ncoghlan	set	messages: + msg166752
2012-07-29 11:03:54	pitrou	set	messages: + msg166747
2012-07-29 07:01:18	ncoghlan	set	messages: + msg166722
2012-06-20 12:33:23	bkabrda	set	nosy: + bkabrda
2012-06-12 12:00:33	ncoghlan	set	versions: + Python 3.4, - Python 3.3
2012-05-15 11:20:32	ncoghlan	set	messages: + msg160713
2012-05-15 11:05:01	serhiy.storchaka	set	messages: + msg160710
2012-05-15 09:51:07	ncoghlan	set	messages: + msg160703
2012-05-15 09:46:15	ncoghlan	set	messages: + msg160702 title: Enhanced command line features for the runpy module -> Add feature to allow code execution prior to __main__ invocation
2012-05-15 09:36:43	pitrou	set	messages: + msg160700 title: Add feature to allow code execution prior to __main__ invocation -> Enhanced command line features for the runpy module
2012-05-15 04:42:10	ncoghlan	set	title: Enhanced command line features for the runpy module -> Add feature to allow code execution prior to __main__ invocation
2012-05-15 04:41:00	ncoghlan	set	messages: + msg160680 title: Add -C option to run code at Python startup -> Enhanced command line features for the runpy module
2012-05-14 19:02:52	nedbat	set	messages: + msg160657
2012-05-14 17:49:27	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg160647
2012-05-14 17:22:42	pitrou	set	nosy: + pitrou messages: + msg160642
2012-05-14 10:57:59	nedbat	set	nosy: + nedbat
2012-05-14 07:45:40	ncoghlan	set	components: + Interpreter Core
2012-05-14 07:45:19	ncoghlan	set	stage: needs patch type: enhancement versions: + Python 3.3
2012-05-14 07:43:38	ncoghlan	create