Message 133833 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ncoghlan
Recipients	Greg.Slodkowicz, belopolsky, eli.bendersky, georg.brandl, giampaolo.rodola, ncoghlan, terry.reedy
Date	2011-04-15.15:03:07
SpamBayes Score	3.330669e-16
Marked as misclassified	No
Message-id	<1302879789.88.0.292379565108.issue9325@psf.upfronthosting.co.za>
In-reply-to

Content
Good point about the extra parameter just pushing the problem one layer up the stack rather than completely solving the problem. However, on further reflection, I've realised that I really don't like having runpy import the threading module automatically, since that means even single-threaded applications run via "-m" will end up initialising the thread support, including the GIL. That's something we try reasonably hard to avoid doing in applications that don't actually need it (it does happen in library modules that genuinely need thread-local storage, such as the decimal module). If you look at the way Pdb._runscript currently works, it imports __main__ and then cleans it out ready to let the child script run. So replacing that with a simple module level global that refers to the runpy execution namespace would probably be an improvement. Looking at this use case more closely, though, shows that it isn't as simple as handing the whole task over to the runpy module, as the debugger needs access to the filename before it starts executing code in order to configure the trace function correctly. That means runpy needs to support a two stage execution process that allows a client script like pdb to retrieve details of the code to be executed, and then subsequently request that it be executed in a specific namespace. My first thought is to switch to a more object-oriented API along the lines of the following: - get_path_runner() - get_module_runner() These functions would parallel the current run_module() and run_path() functions, but would return a CodeRunner object instead of directly executing the specified module - CodeRunner.run(module=None) This method would actually execute the code, using the specified namespace if given, or an automatic temporary namespace otherwise. CodeRunner would store sufficient state to support the delayed execution, as well as providing access to key pieces of information (such as the filename) before code execution actually occurs. pdb could then largely be left alone from a semantic point of view (i.e. still execute everything in the true __main__ module), except that its current code for finding the script to execute would be replaced by a call to runpy.get_runner_for_path(), a new "-m" switch would be added that tweaked that path to invoke runp.get_runner_for_module() instead, the debugger priming step would query the CodeRunner object for the filename, and finally, the actual code execution step would invoke the run() method of the CodeRunner object (passing in __main__ itself as the target module).

Good point about the extra parameter just pushing the problem one layer up the stack rather than completely solving the problem.

However, on further reflection, I've realised that I really don't like having runpy import the threading module automatically, since that means even single-threaded applications run via "-m" will end up initialising the thread support, including the GIL. That's something we try reasonably hard to avoid doing in applications that don't actually need it (it does happen in library modules that genuinely need thread-local storage, such as the decimal module).

If you look at the way Pdb._runscript currently works, it imports __main__ and then cleans it out ready to let the child script run. So replacing that with a simple module level global that refers to the runpy execution namespace would probably be an improvement.

Looking at this use case more closely, though, shows that it isn't as simple as handing the whole task over to the runpy module, as the debugger needs access to the filename before it starts executing code in order to configure the trace function correctly.

That means runpy needs to support a two stage execution process that allows a client script like pdb to retrieve details of the code to be executed, and then subsequently request that it be executed in a specific namespace. My first thought is to switch to a more object-oriented API along the lines of the following:

- get_path_runner()
- get_module_runner()
    These functions would parallel the current run_module() and run_path() functions, but would return a CodeRunner object instead of directly executing the specified module

- CodeRunner.run(module=None)
    This method would actually execute the code, using the specified namespace if given, or an automatic temporary namespace otherwise.

CodeRunner would store sufficient state to support the delayed execution, as well as providing access to key pieces of information (such as the filename) before code execution actually occurs.

pdb could then largely be left alone from a semantic point of view (i.e. still execute everything in the true __main__ module), except that its current code for finding the script to execute would be replaced by a call to runpy.get_runner_for_path(), a new "-m" switch would be added that tweaked that path to invoke runp.get_runner_for_module() instead, the debugger priming step would query the CodeRunner object for the filename, and finally, the actual code execution step would invoke the run() method of the CodeRunner object (passing in __main__ itself as the target module).

History
Date	User	Action	Args
2011-04-15 15:03:09	ncoghlan	set	recipients: + ncoghlan, georg.brandl, terry.reedy, belopolsky, giampaolo.rodola, eli.bendersky, Greg.Slodkowicz
2011-04-15 15:03:09	ncoghlan	set	messageid: <1302879789.88.0.292379565108.issue9325@psf.upfronthosting.co.za>
2011-04-15 15:03:09	ncoghlan	link	issue9325 messages
2011-04-15 15:03:07	ncoghlan	create