Issue 9325: Add an option to pdb/trace/profile to run library module as a script

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/53571

classification

Title:	Add an option to pdb/trace/profile to run library module as a script
Type:	enhancement	Stage:	needs patch
Components:	Library (Lib)	Versions:	Python 3.4

process

Status:	open	Resolution:
Dependencies:	21862 32206 32512 32515	Superseder:
Assigned To:		Nosy List:	Greg.Slodkowicz, Segev Finer, belopolsky, eric.araujo, eric.snow, georg.brandl, giampaolo.rodola, mariocj89, ncoghlan, piotr.dobrogost, terry.reedy
Priority:	normal	Keywords:

Created on 2010-07-21 20:24 by belopolsky, last changed 2022-04-11 14:57 by admin.

Messages (20)
msg111111 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-07-21 20:24
The -m interpreter option allows one to run library module as a script, but if you want to debug, profile or trace the execution of the same, you must supply the path to the module source file on the command line. The resulting execution may also be different from python -m run especially when the module is located within a package. I would like to be able to do $ python -m trace <trace options> --run-module <module name> and the same with pdb and profile in place of trace.
msg117102 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2010-09-21 21:26
I've thought about this in the past, but never really pursued it due to the question of what to do with the __main__ namespace. There are three options here: 1. Use runpy.run_module to run the module in a fresh __main__ namespace 2. Use runpy.run_module to run the module under its own name 3. Use runpy._run_module_as_main to run the module in the real __main__ namespace Option 3 is probably a bad idea (due to the risk of clobbering globals from pdb/trace/profile/doctest/etc) but failing to do it that way creates a difference between the way the actual -m switch works and what these modules will be doing. That said, I haven't looked closely at what these modules do for ordinary scripts, where much the same problem will already arise. If option 1 is adequate for this purpose, then it shouldn't be that hard to add - it's just that I've never done the investigation to see if it would be adequate.
msg118020 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-10-05 16:59
I am afraid, for ordinary scripts these modules effectively use option 3. I think these modules should remove its own scaffolding from "real" __main__ before loading any traced code. I am not sure how this can be achieved, though.
msg118032 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2010-10-05 20:45
On Wed, Oct 6, 2010 at 2:59 AM, Alexander Belopolsky <report@bugs.python.org> wrote: > > Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: > > I am afraid, for ordinary scripts these modules effectively use option 3. I think these modules should remove its own scaffolding from "real" __main__ before loading any traced code. I am not sure how this can be achieved, though. If you use runpy.run_module or runpy.run_path, they will switch the existing __main__ out of sys.modules, replacing it with a temporary module. However, that approach is currently slightly broken, in that it leaves the temporary module namespace inaccessible if the module execution fails with an exception (hence the existence of run_module_as_main). I've thought of a few ways to fix that, but never explored any of them: - allow the module to be used for execution to be passed in to run_module and run_path as a new optional parameter - allow a list (or other mutable container) to be passed in as an output parameter, and stick the temporary module in there - define a thread-local variable for the runpy module that stores the last module namespace executed via runpy in the current thread (and a convenience API for retrieving it) Option 2 strikes me as rather clumsy, so we can probably skip that. I find option 3 to be quite elegant in a sys.exc_info() kind of way, but option 1 is probably simpler.
msg133482 - (view)	Author: Greg Słodkowicz (Greg.Slodkowicz)	Date: 2011-04-10 21:26
Following Nick's advice, I extended runpy.run_module to accept an extra parameter to be used as replacement __main__ namespace. Having this, I can make this temporary __main__ accessible in main() in modules like trace/profile/pdb even if module execution fails with an exception. The problem is that it's visible only in the calling function but not in the global namespace. One way to make it accessible for post mortem debugging would be to create the replacement __main__ module in the global namespace and then pass as a parameter to main(), but this seems clumsy. So maybe the way to go is to have runpy store last used __main__, sys.exc_info() style. In this case, would this be the correct way to store it in runpy: try: import threading except ImportError: temp_main = None else: local_storage = threading.local() local_storage.temp_main = None temp_main = local_storage.temp_main ?
msg133833 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2011-04-15 15:03
Good point about the extra parameter just pushing the problem one layer up the stack rather than completely solving the problem. However, on further reflection, I've realised that I really don't like having runpy import the threading module automatically, since that means even single-threaded applications run via "-m" will end up initialising the thread support, including the GIL. That's something we try reasonably hard to avoid doing in applications that don't actually need it (it does happen in library modules that genuinely need thread-local storage, such as the decimal module). If you look at the way Pdb._runscript currently works, it imports __main__ and then cleans it out ready to let the child script run. So replacing that with a simple module level global that refers to the runpy execution namespace would probably be an improvement. Looking at this use case more closely, though, shows that it isn't as simple as handing the whole task over to the runpy module, as the debugger needs access to the filename before it starts executing code in order to configure the trace function correctly. That means runpy needs to support a two stage execution process that allows a client script like pdb to retrieve details of the code to be executed, and then subsequently request that it be executed in a specific namespace. My first thought is to switch to a more object-oriented API along the lines of the following: - get_path_runner() - get_module_runner() These functions would parallel the current run_module() and run_path() functions, but would return a CodeRunner object instead of directly executing the specified module - CodeRunner.run(module=None) This method would actually execute the code, using the specified namespace if given, or an automatic temporary namespace otherwise. CodeRunner would store sufficient state to support the delayed execution, as well as providing access to key pieces of information (such as the filename) before code execution actually occurs. pdb could then largely be left alone from a semantic point of view (i.e. still execute everything in the true __main__ module), except that its current code for finding the script to execute would be replaced by a call to runpy.get_runner_for_path(), a new "-m" switch would be added that tweaked that path to invoke runp.get_runner_for_module() instead, the debugger priming step would query the CodeRunner object for the filename, and finally, the actual code execution step would invoke the run() method of the CodeRunner object (passing in __main__ itself as the target module).
msg134266 - (view)	Author: Greg Słodkowicz (Greg.Slodkowicz)	Date: 2011-04-22 15:40
Thanks, Nick. Before your last comment, I haven't looked much into Pdb, instead focusing on profile.py and trace.py because they looked like simpler cases. I think the approach with CodeRunner objects would work just fine for profile and trace but Pdb uses run() inherited from Bdb. In order to make it work with a CodeRunner object, it seems run() would have to be reimplemented in Pdb (effectively becoming a 'runCodeRunner()'), and we could probably do without _runscript(). Is that what you had in mind?
msg198453 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-09-26 22:34
Issue 17473 had a longer list of relevant modules: pdb profile doctest trace modulefinder tabnanny pyclbr dis
msg198454 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-09-26 22:38
Also, the ModuleSpec PEP (PEP 451)should make the proposed refactoring much simpler, since the code runner could just expose data from the module spec.
msg198459 - (view)	Author: Eric Snow (eric.snow) *	Date: 2013-09-27 00:38
Soon, my precious, soon...
msg206429 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2013-12-17 12:48
Issue 19982 suggests a different way of refactoring the runpy APIs inspired by PEP 451: passing in a "target" module to be used, rather than creating a temporary one from scratch.
msg305716 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2017-11-07 07:55
Issue 21862 is a related issue specifically for the cProfile module. In that case, cProfile's command line invocation doesn't use the main module, so the patch is able to just create a synthetic call to runpy.run_module as a string and compile that as the code to be profiled.
msg309492 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2018-01-05 07:34
Issue 32206 covers doing this for `pdb`. It relies on directly accessing private APIs in the `runpy` module, but we can live with that, since `pdb` is part of the standard library.
msg309567 - (view)	Author: Mario Corchero (mariocj89) *	Date: 2018-01-06 18:44
pdb and cProfile are covered. If no one is working on it, do you want me try to put through a patch for "profile" and "trace"? Should I create a separate issue if so? From Issue 17473 it will leave only: doctest: Which might be controversial dis: main is execution a "_test" function. That said, running -m dis might be useful. The rest might not need the change as: modulefinder: __main__ is for test purposes tabnanny: works on files and directories pyclbr: already works with modules
msg309617 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2018-01-07 13:50
+1 for creating separate issues and linking them from this one - while the risk of breaking anything seems low, if we do cause a regression, multiple issues and PRs provide better traceability than one giant issue for everything. (I'm also not aware of anyone else actively working on this since Sanyam's cProfile PR, so go ahead and create issues and PRs for any which you're interested in working on) As far as `dis` specifically goes, while the function name in `dis` is "_test()" and it doesn't provide a meaningful help message, it's a genuinely useful CLI operation: it disassembles whatever file you provide, or `stdin` if you don't provide one: $ echo "print('Hello')" \| python3 -m dis 1 0 LOAD_NAME 0 (print) 2 LOAD_CONST 0 ('Hello') 4 CALL_FUNCTION 1 6 POP_TOP 8 LOAD_CONST 1 (None) 10 RETURN_VALUE So a `-m` option does make sense in `dis`, but it should probably be accompanied by some other changes as well (like a better name for the private function, and `--help` support).
msg309632 - (view)	Author: Mario Corchero (mariocj89) *	Date: 2018-01-07 19:14
I've created an issue + PR for profile which basically ports the change in cProfile: issue32512 I am not able to add it as a dependency on this one (rights issue probably).
msg309639 - (view)	Author: Mario Corchero (mariocj89) *	Date: 2018-01-07 21:00
Just finished a draft on the one for trace: issue32515
msg309656 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2018-01-08 02:18
Thanks. I've added the dependencies, and also granted you triage permissions on the tracker, so you should be able to edit dependencies yourself in the future.
msg309781 - (view)	Author: Mario Corchero (mariocj89) *	Date: 2018-01-10 21:20
Thanks Nick. I've sent patches for all of them but `dis`. `dis` does not "run" the code. Adding the -m option is basically identical to just running it on the __main__.py if the module is runnable or on the __init__ if it is not. If you think there is still value on that, I am happy to send a PR for it.
msg309785 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2018-01-10 23:52
While I do think it makes sense to enhance `dis` in this regard, I'm also thinking it might be better to have that automatically fall back to a `python -m inspect module:qualname` style lookup in the event that `os.path.exists(infile)` is false. So considering it out of scope for this issue makes sense.

History
Date	User	Action	Args
2022-04-11 14:57:04	admin	set	github: 53571
2018-01-11 17:34:52	belopolsky	set	assignee: belopolsky ->
2018-01-10 23:52:52	ncoghlan	set	messages: + msg309785
2018-01-10 21:20:40	mariocj89	set	messages: + msg309781
2018-01-08 02:18:30	ncoghlan	set	dependencies: + Add an option to profile to run library module as a script, Add an option to trace to run module as a script messages: + msg309656
2018-01-07 21:00:41	mariocj89	set	messages: + msg309639
2018-01-07 19:14:37	mariocj89	set	messages: + msg309632
2018-01-07 13:50:19	ncoghlan	set	messages: + msg309617
2018-01-06 18:44:05	mariocj89	set	nosy: + mariocj89 messages: + msg309567
2018-01-05 07:34:30	ncoghlan	set	dependencies: + cProfile command-line should accept "-m module_name" as an alternative to script path, Run modules with pdb messages: + msg309492
2017-11-07 07:55:02	ncoghlan	set	messages: + msg305716
2017-07-29 21:05:58	Segev Finer	set	nosy: + Segev Finer
2016-02-22 08:31:44	piotr.dobrogost	set	nosy: + piotr.dobrogost
2014-07-12 14:54:40	berker.peksag	set	nosy: - berker.peksag
2013-12-17 12:48:23	ncoghlan	set	messages: + msg206429
2013-09-27 08:05:55	berker.peksag	set	nosy: + berker.peksag versions: + Python 3.4, - Python 3.3
2013-09-27 00:38:39	eric.snow	set	messages: + msg198459
2013-09-26 22:38:35	ncoghlan	set	messages: + msg198454
2013-09-26 22:34:55	ncoghlan	set	messages: + msg198453
2013-09-26 22:33:17	ncoghlan	link	issue17473 superseder
2012-11-13 04:59:33	eric.snow	set	nosy: + eric.snow
2012-07-15 03:55:15	eli.bendersky	set	nosy: - eli.bendersky
2011-04-23 15:41:15	eric.araujo	set	nosy: + eric.araujo versions: + Python 3.3, - Python 3.2
2011-04-22 15:40:06	Greg.Slodkowicz	set	messages: + msg134266
2011-04-15 15:03:09	ncoghlan	set	messages: + msg133833
2011-04-10 21:26:55	Greg.Slodkowicz	set	messages: + msg133482
2011-04-01 08:33:09	Greg.Slodkowicz	set	nosy: + Greg.Slodkowicz
2010-10-06 12:26:05	giampaolo.rodola	set	nosy: + giampaolo.rodola
2010-10-05 20:45:28	ncoghlan	set	messages: + msg118032
2010-10-05 16:59:17	belopolsky	set	messages: + msg118020
2010-09-21 21:26:19	ncoghlan	set	messages: + msg117102
2010-09-21 18:37:17	belopolsky	set	nosy: + terry.reedy, ncoghlan
2010-07-30 15:14:33	belopolsky	set	nosy: + georg.brandl
2010-07-21 20:25:50	belopolsky	set	nosy: + eli.bendersky
2010-07-21 20:24:58	belopolsky	create