classification
Title: Dotted name re-import does not rebind after deletion
Type: behavior Stage: resolved
Components: Interpreter Core, Library (Lib) Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Decorater, brett.cannon, eric.snow, gvanrossum, ncoghlan, r.david.murray, terry.reedy
Priority: normal Keywords:

Created on 2016-07-14 20:48 by terry.reedy, last changed 2016-07-17 00:48 by terry.reedy. This issue is now closed.

Messages (16)
msg270438 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-07-14 20:48
https://docs.python.org/3/reference/simple_stmts.html#the-import-statement says that import <module>, where <module> can optionally be a dotted name referring to a module within a package, does two things:

1. Find a module object corresponding to <module>, creating it if necessary.
2. Bind the object to the name in the local namespace.  In short, 'import x' is shorthand for "x = __import__('x', ...)".

AFAIK, this works for simple names, including re-imports after name deletion.

>>> import email; email
<module 'email' from 'C:\\Programs\\Python36\\lib\\email\\__init__.py'>
>>> del email
>>> import email; email
<module 'email' from 'C:\\Programs\\Python36\\lib\\email\\__init__.py'>

However, the same is not true for dotted names.

>>> import email.charset; email.charset
<module 'email.charset' from 'C:\\Programs\\Python36\\lib\\email\\charset.py'>
>>> del email.charset
>>> import email.charset; email.charset
Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
    import email.charset; email.charset
AttributeError: module 'email' has no attribute 'charset'

It appears that for dotted names, when step 1 is cut short by finding the cached module, step 2 is (improperly) omitted.  I consider this a bug in the code rather than the doc.  I think the name binding should not depend on how the module was found.  I don't know whether the bug is somewhere in importlib or in the core machinery that uses it.

This bug, in relation to tkinter package modues, prevented and AFAIK prevents me from fixing the bug of #25507 in 3.x versions prior to 3.6 (Tkinter in 2.x was not a package).  (For 3.6, I can and will refactor idlelib to eliminate the need for the deletion by preventing excess imports.)
msg270440 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-07-14 20:57
I don't think this is a bug. You're paraphrasing step 2 incorrectly. The local name-binding behavior just creates a local named "email" -- it doesn't concern itself with ensuring that the email package has an attribute "charset". That attribute is set as a side effect of *loading* the charset submodule, not as a side effect of every import statement.
msg270441 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-14 21:04
Or, to put it another way, you deleted the charset attribute from the email module.  On the second import, import finds email in sys.modules, so it doesn't reimport it, so charset doesn't get recreated.  So, the difference is you didn't do 'del email'.
msg270442 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-14 21:08
Heh, actually strike that last sentence.  Doing a del doesn't remove the module from sys.modules, so the version of the email module with charset deleted is still there, and doing del email and import email or import email.charset will not change the fact that charset is still deleted in the extant module.
msg270443 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-07-14 21:09
> On the second import, import finds email in sys.modules

Actually, it only takes the shortcut because it also finds
email.charset in sys.modules.
msg270444 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-14 21:14
So, Guido answered your actual question, and I was confused :(  The important point is that email.charset still exists in sys.modules, so import doesn't reload it, and as Guido says module load is the thing that creates the attribute mapping.
msg270456 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-07-15 02:08
I don't understand the statement that having 'import email.charset' set email.charset is an optional side effect.  Without the name setting, the module is inaccessible except through sys.modules, which is not obvious to beginners.  To me, making a module directly accessible is the main point of an import.  As it is now, the second 'import email.charset' in the example is equivalent to 'pass'.

The test example is simplified to show the core behavior.  The real life problem arises when the second 'import package.submodule' is in a different module, perhaps the main module, or in the case of an IDE such as IDLE, in user code exec()ed as is it were running in a main module.

When a module runs 'import pack.sub' (for now, only the first time), the 'pack' module is 'monkey-patched' by externally injecting 'sub'.  After this, without a deletion, the normally buggy "import pack; print(pack.sub)" executed anywhere in the process will work.

Since June 2015, there have been 4 StackOverflow questions about IDLE giving implicit false positives by running code that fails when run directly in Python.  I believe that this is more questions during this period than for any other IDLE issue.

Currently, if a module author tries to be conscientious and clean up any injections it caused, then the process-global effect is to disable normally correct code.

The point of this issue, however classified, is to make it possible for a module to access the modules of a different package without causing mysterious action-at-a-distance effects.  Sorry I did not initially explain this better here.
msg270465 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-07-15 05:54
Are you sure you realize that "import email.charset" doesn't create a
local variable named "email.charset"? It creates a local variable
named "email" which happens to have an attribute "charset".

The problem with "import pack; print(pack.sub)" being unpredictable is
explainable though annoying, but I don't think it can be avoided.
After all there's no intrinsic reason why module "pack" couldn't have
some attribute named "sub" unrelated to a submodule.

I'm not sure what you mean by injections or cleaning them up. Perhaps
there's some overzealousness here that causes this broken behavior?

Is there somewhere in the IDLE code that you'd like me to look at?
msg270467 - (view) Author: Decorater (Decorater) * Date: 2016-07-15 06:46
I think on the 2nd example they did they got it wrong somewhat.
I think this is what they wanted.

>>> import email.charset; email.charset
<module 'email.charset' from 'C:\\Programs\\Python36\\lib\\email\\charset.py'>
>>> del email #to actually delete the whole thing.
>>> import email.charset; email.charset
msg270470 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-07-15 08:08
Decorator: as I tried to say in msg270456, the two examples are artificial tests stripped to the bare minimum.  So is use of email and charset, other than the fact that it exists in 2.7 as well as current 3.x.

The real situation where this issue came up for me is the subprocess that IDLE uses to execute user code, separate as much as possible from IDLE code.  idlelib.run imports tkinter.  It them imports a few other idlelib modules that collectively import at least tkinter.font, tkinter.messagebox, and tkinter.simpledialog.  These indirect imports change the tkinter module.  Consequently, when users submit code such as

import tkinter as tk
root = tk.Tk()
myfont = tk.font.Font(...)  # tk.font should raise NameError

it runs.  But when they try to run their 'correct' program directly with Python, they get the NameError and must add, for instance, 'import tk.font'.  IDLE should help people write standard python code, not python with a few custom augmentations.

My first attempt to fix this was to have run execute 'del tkinter.font', etcetera after the indirect imports are done.  Then users would get the NameError they should.  But currently, they would continue getting the NameError even after adding the needed import. This is because the first font import in *their* code is the second or later font import in the process.  My second example is a condensed version of this failure.
msg270472 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-07-15 08:36
Guido: I am aware that 'import tkinter.font' can create 0, 1, or 2 new modules and 0, 1, or 2 new name bindings, one in the *importing* module and one in the *imported* module. You are correct that point 2 in the doc only talks about the importing module, making my paraphrase not correct for dotted names.  For the latter, the doc says.

"If the module being imported is not a top level module, then the name of the top level package that contains the module is bound in the local namespace as a reference to the top level package. The imported module must be accessed using its full qualified name rather than directly."

To me, this implies that the top level and intermediate packages will be modified as needed so that the 'full qualified name' works.
msg270473 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-07-15 08:39
To fully remove an imported submodule, you also need to purge it from the sys.modules cache:

    >>> import email.charset; email.charset
    <module 'email.charset' from '/usr/lib64/python3.5/email/charset.py'>
    >>> import sys
    >>> del email.charset; del sys.modules["email.charset"]
    >>> import email.charset; email.charset
    <module 'email.charset' from '/usr/lib64/python3.5/email/charset.py'>

The reason we don't provide utilities in importlib to purge modules that way is because not all modules support being forcibly reloaded like this, and unlike imp.reload(), the errors happen at the point of importing it again later, not at the point where you purge it.

However, if you can figure out precisely which "tk" submodules IDLE implicitly imports, you can do one of three things for each one:

1. Change IDLE to avoid importing it in the subprocess where user code runs;
2. Test it supports reloading, and do "del tk.<submodule>; del sys.modules['tk.<submodule>']" in the IDLE subprocess after you're finished with it; or
3. Change tk.__init__ to implicitly import that submodule (such that code that only imports "tk" will still be able to access it)
msg270480 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-07-15 12:08
I am pursuing 1. for 3.6.  The first patch, moving 4 objects from pyshell to run and an import from run to pyshell, reduced a bloated len(sys.modules) from 193 to 156. I hope to get under 100.  I will test 2., separately for each affected tkinter module, for 2.7 and 3.5.
msg270510 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-07-15 20:48
Nick, do I understand correctly that if the reimport executes and I can access the module, everything should be okay?
msg270542 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-07-16 06:45
Terry: it's not a 100% guarantee, but it should be sufficient for your purposes (the more obscure failure modes mostly relate to C level globals, Python level module globals, pickling, and module import having side effects on state in other modules, and it's unlikely you'll hit any of those here as long as the main "tk" module and any modules it implicitly imports stay loaded. If you do end up getting bug reports about this, we can treat those as a bug in the affected modules)

As far as the module count goes, a plain "import tkinter" gets the imported module count up to 63, so that's presumably an absolute lower bound on your efforts.
msg270599 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-07-17 00:48
I understand the reluctance to generically encourage something that does not always arise.  With Nick's promise to help examine any particular problems with deletion of tkinter modules, should they arise, I feel comfortable closing this. I already tested and applied double deletion to both 3.5 and 3.6.

The tkinter import example illustrates potential benefits from refactoring (which I will continue in #27534).  For me, on Win10, 'import tkinter' in a fresh interactive interpreter boosts 'len(sys.modules) from 41 to 65.  Except when run cannot start, it only uses tkinter to call _tkinter.tkapp().eval('update') in its interactive input loop. Importing _tkinter instead (a slight risk) would not import anything else.

Guido: when I need help with some of the more obscure IDLE code, I will ask.
History
Date User Action Args
2016-07-17 00:48:38terry.reedysetstatus: open -> closed
resolution: rejected
messages: + msg270599

stage: test needed -> resolved
2016-07-16 06:45:52ncoghlansetmessages: + msg270542
2016-07-15 20:48:13terry.reedysetmessages: + msg270510
2016-07-15 12:08:22terry.reedysetmessages: + msg270480
2016-07-15 08:39:03ncoghlansetmessages: + msg270473
2016-07-15 08:36:00terry.reedysetmessages: + msg270472
2016-07-15 08:08:41terry.reedysetmessages: + msg270470
2016-07-15 06:46:20Decoratersetnosy: + Decorater
messages: + msg270467
2016-07-15 05:54:18gvanrossumsetmessages: + msg270465
2016-07-15 02:08:44terry.reedysetmessages: + msg270456
2016-07-14 21:14:25r.david.murraysetmessages: + msg270444
2016-07-14 21:09:09gvanrossumsetmessages: + msg270443
2016-07-14 21:08:57r.david.murraysetmessages: + msg270442
2016-07-14 21:04:25r.david.murraysetnosy: + r.david.murray
messages: + msg270441
2016-07-14 20:57:43gvanrossumsetnosy: + gvanrossum
messages: + msg270440
2016-07-14 20:48:35terry.reedycreate