Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dotted name re-import does not rebind after deletion #71702

Closed
terryjreedy opened this issue Jul 14, 2016 · 16 comments
Closed

Dotted name re-import does not rebind after deletion #71702

terryjreedy opened this issue Jul 14, 2016 · 16 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@terryjreedy
Copy link
Member

BPO 27515
Nosy @gvanrossum, @brettcannon, @terryjreedy, @ncoghlan, @bitdancer, @ericsnowcurrently, @AraHaan

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2016-07-17.00:48:38.789>
created_at = <Date 2016-07-14.20:48:35.468>
labels = ['interpreter-core', 'type-bug', 'library']
title = 'Dotted name re-import does not rebind after deletion'
updated_at = <Date 2016-07-17.00:48:38.769>
user = 'https://github.com/terryjreedy'

bugs.python.org fields:

activity = <Date 2016-07-17.00:48:38.769>
actor = 'terry.reedy'
assignee = 'none'
closed = True
closed_date = <Date 2016-07-17.00:48:38.789>
closer = 'terry.reedy'
components = ['Interpreter Core', 'Library (Lib)']
creation = <Date 2016-07-14.20:48:35.468>
creator = 'terry.reedy'
dependencies = []
files = []
hgrepos = []
issue_num = 27515
keywords = []
message_count = 16.0
messages = ['270438', '270440', '270441', '270442', '270443', '270444', '270456', '270465', '270467', '270470', '270472', '270473', '270480', '270510', '270542', '270599']
nosy_count = 7.0
nosy_names = ['gvanrossum', 'brett.cannon', 'terry.reedy', 'ncoghlan', 'r.david.murray', 'eric.snow', 'Decorater']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue27515'
versions = ['Python 2.7', 'Python 3.5', 'Python 3.6']

@terryjreedy
Copy link
Member Author

https://docs.python.org/3/reference/simple_stmts.html#the-import-statement says that import <module>, where <module> can optionally be a dotted name referring to a module within a package, does two things:

  1. Find a module object corresponding to <module>, creating it if necessary.
  2. Bind the object to the name in the local namespace. In short, 'import x' is shorthand for "x = __import__('x', ...)".

AFAIK, this works for simple names, including re-imports after name deletion.

>>> import email; email
<module 'email' from 'C:\\Programs\\Python36\\lib\\email\\__init__.py'>
>>> del email
>>> import email; email
<module 'email' from 'C:\\Programs\\Python36\\lib\\email\\__init__.py'>

However, the same is not true for dotted names.

>>> import email.charset; email.charset
<module 'email.charset' from 'C:\\Programs\\Python36\\lib\\email\\charset.py'>
>>> del email.charset
>>> import email.charset; email.charset
Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
    import email.charset; email.charset
AttributeError: module 'email' has no attribute 'charset'

It appears that for dotted names, when step 1 is cut short by finding the cached module, step 2 is (improperly) omitted. I consider this a bug in the code rather than the doc. I think the name binding should not depend on how the module was found. I don't know whether the bug is somewhere in importlib or in the core machinery that uses it.

This bug, in relation to tkinter package modues, prevented and AFAIK prevents me from fixing the bug of bpo-25507 in 3.x versions prior to 3.6 (Tkinter in 2.x was not a package). (For 3.6, I can and will refactor idlelib to eliminate the need for the deletion by preventing excess imports.)

@terryjreedy terryjreedy added interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jul 14, 2016
@gvanrossum
Copy link
Member

I don't think this is a bug. You're paraphrasing step 2 incorrectly. The local name-binding behavior just creates a local named "email" -- it doesn't concern itself with ensuring that the email package has an attribute "charset". That attribute is set as a side effect of *loading* the charset submodule, not as a side effect of every import statement.

@bitdancer
Copy link
Member

Or, to put it another way, you deleted the charset attribute from the email module. On the second import, import finds email in sys.modules, so it doesn't reimport it, so charset doesn't get recreated. So, the difference is you didn't do 'del email'.

@bitdancer
Copy link
Member

Heh, actually strike that last sentence. Doing a del doesn't remove the module from sys.modules, so the version of the email module with charset deleted is still there, and doing del email and import email or import email.charset will not change the fact that charset is still deleted in the extant module.

@gvanrossum
Copy link
Member

On the second import, import finds email in sys.modules

Actually, it only takes the shortcut because it also finds
email.charset in sys.modules.

@bitdancer
Copy link
Member

So, Guido answered your actual question, and I was confused :( The important point is that email.charset still exists in sys.modules, so import doesn't reload it, and as Guido says module load is the thing that creates the attribute mapping.

@terryjreedy
Copy link
Member Author

I don't understand the statement that having 'import email.charset' set email.charset is an optional side effect. Without the name setting, the module is inaccessible except through sys.modules, which is not obvious to beginners. To me, making a module directly accessible is the main point of an import. As it is now, the second 'import email.charset' in the example is equivalent to 'pass'.

The test example is simplified to show the core behavior. The real life problem arises when the second 'import package.submodule' is in a different module, perhaps the main module, or in the case of an IDE such as IDLE, in user code exec()ed as is it were running in a main module.

When a module runs 'import pack.sub' (for now, only the first time), the 'pack' module is 'monkey-patched' by externally injecting 'sub'. After this, without a deletion, the normally buggy "import pack; print(pack.sub)" executed anywhere in the process will work.

Since June 2015, there have been 4 StackOverflow questions about IDLE giving implicit false positives by running code that fails when run directly in Python. I believe that this is more questions during this period than for any other IDLE issue.

Currently, if a module author tries to be conscientious and clean up any injections it caused, then the process-global effect is to disable normally correct code.

The point of this issue, however classified, is to make it possible for a module to access the modules of a different package without causing mysterious action-at-a-distance effects. Sorry I did not initially explain this better here.

@gvanrossum
Copy link
Member

Are you sure you realize that "import email.charset" doesn't create a
local variable named "email.charset"? It creates a local variable
named "email" which happens to have an attribute "charset".

The problem with "import pack; print(pack.sub)" being unpredictable is
explainable though annoying, but I don't think it can be avoided.
After all there's no intrinsic reason why module "pack" couldn't have
some attribute named "sub" unrelated to a submodule.

I'm not sure what you mean by injections or cleaning them up. Perhaps
there's some overzealousness here that causes this broken behavior?

Is there somewhere in the IDLE code that you'd like me to look at?

@AraHaan
Copy link
Mannequin

AraHaan mannequin commented Jul 15, 2016

I think on the 2nd example they did they got it wrong somewhat.
I think this is what they wanted.

>>> import email.charset; email.charset
<module 'email.charset' from 'C:\\Programs\\Python36\\lib\\email\\charset.py'>
>>> del email #to actually delete the whole thing.
>>> import email.charset; email.charset

@terryjreedy
Copy link
Member Author

Decorator: as I tried to say in msg270456, the two examples are artificial tests stripped to the bare minimum. So is use of email and charset, other than the fact that it exists in 2.7 as well as current 3.x.

The real situation where this issue came up for me is the subprocess that IDLE uses to execute user code, separate as much as possible from IDLE code. idlelib.run imports tkinter. It them imports a few other idlelib modules that collectively import at least tkinter.font, tkinter.messagebox, and tkinter.simpledialog. These indirect imports change the tkinter module. Consequently, when users submit code such as

import tkinter as tk
root = tk.Tk()
myfont = tk.font.Font(...)  # tk.font should raise NameError

it runs. But when they try to run their 'correct' program directly with Python, they get the NameError and must add, for instance, 'import tk.font'. IDLE should help people write standard python code, not python with a few custom augmentations.

My first attempt to fix this was to have run execute 'del tkinter.font', etcetera after the indirect imports are done. Then users would get the NameError they should. But currently, they would continue getting the NameError even after adding the needed import. This is because the first font import in *their* code is the second or later font import in the process. My second example is a condensed version of this failure.

@terryjreedy
Copy link
Member Author

Guido: I am aware that 'import tkinter.font' can create 0, 1, or 2 new modules and 0, 1, or 2 new name bindings, one in the *importing* module and one in the *imported* module. You are correct that point 2 in the doc only talks about the importing module, making my paraphrase not correct for dotted names. For the latter, the doc says.

"If the module being imported is not a top level module, then the name of the top level package that contains the module is bound in the local namespace as a reference to the top level package. The imported module must be accessed using its full qualified name rather than directly."

To me, this implies that the top level and intermediate packages will be modified as needed so that the 'full qualified name' works.

@ncoghlan
Copy link
Contributor

To fully remove an imported submodule, you also need to purge it from the sys.modules cache:

    >>> import email.charset; email.charset
    <module 'email.charset' from '/usr/lib64/python3.5/email/charset.py'>
    >>> import sys
    >>> del email.charset; del sys.modules["email.charset"]
    >>> import email.charset; email.charset
    <module 'email.charset' from '/usr/lib64/python3.5/email/charset.py'>

The reason we don't provide utilities in importlib to purge modules that way is because not all modules support being forcibly reloaded like this, and unlike imp.reload(), the errors happen at the point of importing it again later, not at the point where you purge it.

However, if you can figure out precisely which "tk" submodules IDLE implicitly imports, you can do one of three things for each one:

  1. Change IDLE to avoid importing it in the subprocess where user code runs;
  2. Test it supports reloading, and do "del tk.<submodule>; del sys.modules['tk.<submodule>']" in the IDLE subprocess after you're finished with it; or
  3. Change tk.__init__ to implicitly import that submodule (such that code that only imports "tk" will still be able to access it)

@terryjreedy
Copy link
Member Author

I am pursuing 1. for 3.6. The first patch, moving 4 objects from pyshell to run and an import from run to pyshell, reduced a bloated len(sys.modules) from 193 to 156. I hope to get under 100. I will test 2., separately for each affected tkinter module, for 2.7 and 3.5.

@terryjreedy
Copy link
Member Author

Nick, do I understand correctly that if the reimport executes and I can access the module, everything should be okay?

@ncoghlan
Copy link
Contributor

Terry: it's not a 100% guarantee, but it should be sufficient for your purposes (the more obscure failure modes mostly relate to C level globals, Python level module globals, pickling, and module import having side effects on state in other modules, and it's unlikely you'll hit any of those here as long as the main "tk" module and any modules it implicitly imports stay loaded. If you do end up getting bug reports about this, we can treat those as a bug in the affected modules)

As far as the module count goes, a plain "import tkinter" gets the imported module count up to 63, so that's presumably an absolute lower bound on your efforts.

@terryjreedy
Copy link
Member Author

I understand the reluctance to generically encourage something that does not always arise. With Nick's promise to help examine any particular problems with deletion of tkinter modules, should they arise, I feel comfortable closing this. I already tested and applied double deletion to both 3.5 and 3.6.

The tkinter import example illustrates potential benefits from refactoring (which I will continue in bpo-27534). For me, on Win10, 'import tkinter' in a fresh interactive interpreter boosts 'len(sys.modules) from 41 to 65. Except when run cannot start, it only uses tkinter to call _tkinter.tkapp().eval('update') in its interactive input loop. Importing _tkinter instead (a slight risk) would not import anything else.

Guido: when I need help with some of the more obscure IDLE code, I will ask.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants