classification
Title: REPL doesn't ensure builtins are available when implicitly recreating __main__
Type: behavior Stage: test needed
Components: Versions: Python 3.8, Python 3.7, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ncoghlan, terry.reedy
Priority: normal Keywords:

Created on 2019-04-19 01:01 by ncoghlan, last changed 2019-04-20 02:26 by terry.reedy.

Messages (5)
msg340516 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-04-19 01:01
While trying to create an example for a pickle bug discussion, I deliberately dropped `__main__` out of sys.modules, and the REPL session lost all of its runtime state.

Simplified reproducer:


```
>>> import sys
>>> mod = sys.modules[__name__]
>>> sys.modules[__name__] = object()
>>> dir()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'dir' is not defined

```

(Initially encountered on Python 2.7, reproduced on Python 3.7)

If I'd just dropped the reference to `__main__` entirely, that would make sense (since modules clear their namespaces when they go away), but I didn't: I saved a reference in a local variable first.

So it appears the CPython REPL isn't keeping a strong reference to either `__main__` or `__main__.__dict__` between statements, so the cyclic GC kicked in and decided the module could be destroyed.
msg340517 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-04-19 01:13
Additional info showing the module getting reset back to the state of a freshly created module namespace:

```
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']
>>> __builtins__
<module 'builtins' (built-in)>
>>> import sys
>>> mod = sys.modules[__name__]
>>> sys.modules[__name__] = object()
>>> __builtins__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name '__builtins__' is not defined
>>> __annotations__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name '__annotations__' is not defined
>>> __doc__
>>> __loader__
>>> __name__
'__main__'
>>> __package__
>>> __spec__
>>> mod
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'mod' is not defined
```
msg340518 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-04-19 01:14
The ``sys`` import gets cleared as well (accidentally omitted from the previous comment):

```
>>> sys
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'sys' is not defined
```
msg340521 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-04-19 01:51
The relevant functions:

* PyRun_InteractiveLoopFlags: https://github.com/python/cpython/blob/e8113f51a8bdf33188ee30a1c038a298329e7bfa/Python/pythonrun.c#L89
* PyRun_InteractiveOneObjectEx: https://github.com/python/cpython/blob/e8113f51a8bdf33188ee30a1c038a298329e7bfa/Python/pythonrun.c#L180

So it turns out I was wrong: nothing is getting cleared anywhere, but instead each statement in the REPL is *importing* `__main__` again in order to find the namespace to use for the statement execution.

Because of the specific API it uses to do that, a non-module object like the one I injected gets replaced with a regular (empty) module object: https://github.com/python/cpython/blob/027b09c5a13aac9e14a3b43bb385298d549c3833/Python/import.c#L791

However, it *doesn't* have the extra code needed to make the `builtins` available: https://github.com/python/cpython/blob/027b09c5a13aac9e14a3b43bb385298d549c3833/Python/import.c#L932

So I now think the only actual *bug* here is the fact that the REPL isn't making sure that `__builtins__` is set appropriately - the rest can be chalked up to implementation defined behaviour around what happens if `__main__` gets replaced or removed in sys.modules while the REPL is running.
msg340561 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-04-20 02:26
To me, the failure of dir() in message 1 is surprising and possibly a bug. I always though of a module globals = locals = dict() instance as continuous across statements, whether in batch or interactive move.  In batch mode

import sys
mod = sys.modules[__name__]
sys.modules[__name__]
print(dir())

works.  Adding '-i' to the command line is supposed to allow one to enter interactive statements to be executed in the same namespace.

In IDLE's Shell, dir() in msg 1 executes normally.  This is because idlelib.run.Executive() initializes the instance by caching globals().
    self.locals = __main__.__dict__
Then self.runcode(self, code) executes user statements with
    exec(code, self.locals)
With exec in the old statement form of 'exec code in self.locals', this pair predates the first patch git has access to, on 5/26/2002 (GvR, committed by Chui Tey).

Could and should, python do similarly, and keep a reference to the module namespace? What did Python do in 2002?  What do other implementations and simulated Shells do now?
History
Date User Action Args
2019-04-20 02:26:37terry.reedysetnosy: + terry.reedy
messages: + msg340561
2019-04-19 01:55:51ncoghlansettitle: Dropping __main__ from sys.modules clears the REPL namespace -> REPL doesn't ensure builtins are available when implicitly recreating __main__
2019-04-19 01:51:50ncoghlansetmessages: + msg340521
2019-04-19 01:14:27ncoghlansetmessages: + msg340518
2019-04-19 01:13:30ncoghlansetmessages: + msg340517
2019-04-19 01:01:55ncoghlancreate