classification
Title: modulefinder chokes on numpy - dereferencing None in spec.loader
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: FFY00, Greg Whiteley, barry, cajetan.rodrigues, dkasak, eric.smith, eric.snow
Priority: normal Keywords: patch

Created on 2020-04-21 07:33 by Greg Whiteley, last changed 2021-05-29 17:16 by FFY00.

Files
File name Uploaded Description Edit
fulllog.txt Greg Whiteley, 2020-04-21 07:33
Pull Requests
URL Status Linked Edit
PR 19917 open cajetan.rodrigues, 2020-05-05 08:43
Messages (5)
msg366912 - (view) Author: Greg Whiteley (Greg Whiteley) Date: 2020-04-21 07:33
Issue:

Running ModuleFinder.run_script() on numpy versions 1.16.1 to 1.18.3 (maybe more) fails with backtrace.  See steps to reproduce below.

I do not see this problem on earlier versions of python than 3.8 (tested 3.4, 3.5, 3.6 on ubuntu LTSs), but the code has changed around 3.8.

The failure comes to this line of modulefinder.py

https://github.com/python/cpython/blame/master/Lib/modulefinder.py#L79

    if spec.loader.is_package(name):
        return None, os.path.dirname(file_path), ("", "", _PKG_DIRECTORY)

I can work around it by changing that to check for None

    if spec.loader is not None and spec.loader.is_package(name):
        return None, os.path.dirname(file_path), ("", "", _PKG_DIRECTORY)


Environment:

Ubuntu 20.04 with default python3, python3-pip

Steps to reproduce:

# note any nump version I've tried 1.16.1 and greater fails - I included 1.18.3 to be precise for reproduciton
$ pip3 install "numpy==1.18.3"
$ cat test.py
import numpy

$ python3
>>> from modulefinder import ModuleFinder
>>> finder = ModuleFinder()
>>> finder.run_script("test.py")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/modulefinder.py", line 165, in run_script
    self.load_module('__main__', fp, pathname, stuff)
...

300 lines of stack elided - see attached fulllog.txt

...
  File "/usr/lib/python3.8/modulefinder.py", line 433, in scan_code
    self._safe_import_hook(name, m, fromlist, level=0)
  File "/usr/lib/python3.8/modulefinder.py", line 378, in _safe_import_hook
    self.import_hook(name, caller, level=level)
  File "/usr/lib/python3.8/modulefinder.py", line 177, in import_hook
    q, tail = self.find_head_package(parent, name)
  File "/usr/lib/python3.8/modulefinder.py", line 233, in find_head_package
    q = self.import_module(head, qname, parent)
  File "/usr/lib/python3.8/modulefinder.py", line 320, in import_module
    fp, pathname, stuff = self.find_module(partname,
  File "/usr/lib/python3.8/modulefinder.py", line 511, in find_module
    return _find_module(name, path)
  File "/usr/lib/python3.8/modulefinder.py", line 78, in _find_module
    if spec.loader.is_package(name):
AttributeError: 'NoneType' object has no attribute 'is_package'
>>> 


Obviously I can't tell if numpy or modulefinder is the real culprit.

Let me know if I can give any more information.
msg367856 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2020-05-01 15:40
Ah, namespace packages. :)  Yeah, the code is not taking the "spec.loader is None" case into account.  I expect the fix would be to add handling of that case a few lines up in the code, right after handling BuiltinImporter and FrozenImporter.  Offhand I'm not sure if the "type" should be _PKG_DIRECTORY or some new one just for namespace packages.  How does imp.find_module() (on which modulefinder._find_module() is based) respond to namespace packages?

[1] https://docs.python.org/3/library/importlib.html#importlib.machinery.ModuleSpec.loader
msg367943 - (view) Author: Cajetan Rodrigues (cajetan.rodrigues) * Date: 2020-05-02 17:42
Reproduced on Python3.9.0a5+

imp.find_module() simply raised an ImportError in my tests with an implicitly namespaced package (without an __init__.py)

About the "type_", I think it should be consistent with _PKG_DIRECTORY, since PEP 420 states the following[1]:

```
A namespace package is not fundamentally different from a regular package. It is just a different way of creating packages. Once a namespace package is created, there is no functional difference between it and a regular package.
```

I'd be happy to submit a patch if you think this is alright.


[1] https://www.python.org/dev/peps/pep-0420/#id24
msg368112 - (view) Author: Cajetan Rodrigues (cajetan.rodrigues) * Date: 2020-05-05 08:47
Turns out using _PKG_DIRECTORY as a type for namespace packages ended up making `_find_module` try to search for an __init__.py within them, since it had no understanding of the diff. between a namespace package and a regular one (the lack of __init__.py), and caused it to break.

I've raised a PR with a new type _NSP_DIRECTORY for namespace-directories.
msg380087 - (view) Author: Denis Kasak (dkasak) Date: 2020-10-31 18:26
Anything still left to do that is stalling this? I just got bitten by it when trying to use modulefinder.
History
Date User Action Args
2021-05-29 17:16:22FFY00setnosy: + FFY00
2020-10-31 18:26:00dkasaksetnosy: + dkasak
messages: + msg380087
2020-05-05 08:47:08cajetan.rodriguessetmessages: + msg368112
2020-05-05 08:43:35cajetan.rodriguessetkeywords: + patch
stage: test needed -> patch review
pull_requests: + pull_request19231
2020-05-04 16:39:26brett.cannonsetnosy: - brett.cannon
2020-05-02 17:42:50cajetan.rodriguessetnosy: + cajetan.rodrigues
messages: + msg367943
2020-05-01 15:41:08eric.snowsetstage: test needed
versions: + Python 3.9
2020-05-01 15:40:46eric.snowsetnosy: + barry, brett.cannon, eric.smith, eric.snow
messages: + msg367856
2020-04-21 07:33:37Greg Whiteleycreate