This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: RFE: emit a warning when a module in a traceback shadows a stdlib module
Type: enhancement Stage:
Components: Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, eric.snow, georg.brandl, larry, ncoghlan, rbcollins, steve.dower, terry.reedy
Priority: normal Keywords:

Created on 2015-03-30 01:00 by ncoghlan, last changed 2022-04-11 14:58 by admin.

Messages (7)
msg239543 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2015-03-30 01:00
A colleague just ran into the issue where they created a "json.py" module in their local directory and broke a previously working program. I picked up on the mistake when I saw the following traceback snippet:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.7/site-packages/praw/__init__.py", line 27, in <module>
        import json
      File "json.py", line 1, in <module>
        r = requests.get("http://www.reddit.com/user/spilcm/about/.json")

However, actually making the connection required thinking "Wait, why is the stdlib JSON module calling out to Reddit?" followed immediately by "Oh, that's not the stdlib json module".

That connection would potentially be easier for new Python users to make if there was an inline notification printed after the traceback warning of the stdlib module shadowing. If nothing else, it would give them something specific to search for to lead them to a discussion of name shadowing and the problems that can arise when you name one of your local modules the same as a standard library module.

Offering such a feature would necessarily rely on having a standard way of getting a programmatic list of the standard library modules available to the running interpreter version without actually loading them all, a question which is discussed further in https://stackoverflow.com/questions/6463918/how-can-i-get-a-list-of-all-the-python-standard-library-modules and https://github.com/jackmaney/python-stdlib-list

Since the contents of the standard library for a given release is immutable, and we'd like to keep any such mechanism (if we decide to provide it) very simple so we can easily access it from the low level interpreter traceback machinery, my suggestion would be along the lines of:

1. Update the interpreter build process to emit a flat text file containing a complete list of fully qualified stdlib module names (including submodules), one name per line (this will be generated at build time, so it *won't* contain filenames, just the module names - however, it could theoretically be combined with importlib.util.find_spec to generate a PEP 376 style RECORD file for the non-builtin parts of the standard library at installation time)
2. Rather than exposing the underlying machinery to end users directly, instead expose a function in sysconfig to read that list of modules into memory via a read-only line based iterator. (There may be an internal API to allow access to the content without going through sysconfig)
3. Updating the two main traceback display mechanisms (the builtin traceback display and the traceback module) to check that list for potential conflicts after displaying a traceback

(This idea could benefit from discussion on python-ideas and potentially even escalate into a full PEP, but I don't currently have the personal bandwidth available to pursue that course. Accordingly, I'm just filing the idea here in case someone else finds it intriguing and has more time available to pursue it)
msg239544 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2015-03-30 01:03
Why limit this to just stdlib shadowing?

A local module can shadow a top level module-or-package across the board. If we don't limit it to stdlib names, it becomes a lot easier to implement.
msg239545 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2015-03-30 01:07
Implementation wise: this is not part of the generic rendering-of-tracebacks; I'd like to make the traceback new stuff be tastefully extensible - I'd be inclined to do this with a per-frame-callback on render (so we don't pay overhead on unrendered tb's) and a global variable that controls the default hooks loaded into the objects.

That would:
 - allow folk using the raw API to control what hooks are at play
 - allow third-parties to install additional things globally, to influence code using the existing old tb apis
 - make testing of the warning specific code be very straight forward and isolated
msg239546 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2015-03-30 01:14
I proposed limiting it to stdlib names as that's the case where we see the most beginner confusion (experimenting with sockets in a file named "socket.py", etc), and the one where we can generate a comprehensive list of known module names ahead of time (so in most cases we won't need to touch the filesystem to determine there's no shadowing happening once the list of stdlib modules has been pulled in to memory).

Handling the general case means we have to instead rely on rescanning the import system at runtime. On the other hand, we're already doing something along those lines through linecache to handle retrieving source lines, and the import system itself already has caches everywhere (hence importlib.invalidate_caches()), so adding yet-another-cache to be able to check for module shadowing efficiently wouldn't be that big a deal.

Adding Brett & Eric to get some additional perspectives on the idea of doing checks for name shadowing through a new importlib.util API rather than a separate stdlib specific mechanism.
msg239547 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2015-03-30 01:16
On the implementation front, +1 for looking at a per-frame callback API rather than hardcoding this directly into the existing traceback rendering code.
msg239549 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2015-03-30 01:23
I had an idea for a possible importlib.util API to support this capability: an "ignore_entries=0" default arg to https://docs.python.org/dev/library/importlib.html#importlib.util.find_spec

The new arg would say how many found entries to skip when looking for the module, defaulting to returning the first hit as it does now.

The challenging part would be that all the implicit import system caching is currently built around the notion of "there can be only one", so we'd likely want to introduce something like a new name shadowing cache that mapped "(modname, ignored_entries)" pairs to module specs (and was cleared by importlib.invalidate_caches() along with everything else).
msg240032 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-04-03 20:22
This is a fairly common newbie bug.  random.py seems to be  favorite name.
History
Date User Action Args
2022-04-11 14:58:14adminsetgithub: 67997
2015-04-14 14:12:57steve.dowersetnosy: + steve.dower
2015-04-03 20:22:58terry.reedysetnosy: + terry.reedy
messages: + msg240032
2015-03-30 01:23:56ncoghlansetmessages: + msg239549
2015-03-30 01:16:14ncoghlansetmessages: + msg239547
2015-03-30 01:14:59ncoghlansetnosy: + brett.cannon, eric.snow
messages: + msg239546
2015-03-30 01:07:17rbcollinssetmessages: + msg239545
2015-03-30 01:03:37rbcollinssetmessages: + msg239544
2015-03-30 01:00:31ncoghlancreate