This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: modulefinder should use import hooks properly
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: brandtbucher, plokmijnuhby
Priority: normal Keywords: patch

Created on 2019-11-06 16:05 by plokmijnuhby, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 17326 closed plokmijnuhby, 2019-11-21 19:55
Messages (8)
msg356143 - (view) Author: Dominic Littlewood (plokmijnuhby) * Date: 2019-11-06 16:05
modulefinder currently will detect modules imported by a script dynamically by running the script, and to do this it has to open the script as a file.

This would also fix what appears to be a bug where modulefinder failed to open files in bytes mode sometimes, causing it to fail for some modules (like functools) if the wrong encoding is used. I say "appears to be", because the structure of the module seems to imply this was intended behaviour, but I can't think why anyone would want this.

This is similar to two other issues I'm posting, but they're in different modules, so different bugs.
msg356547 - (view) Author: Dominic Littlewood (plokmijnuhby) * Date: 2019-11-13 19:14
Okay, I've started putting together a proper PR, and I've had some thoughts.


There's a useful script at the bottom of the importlib documentation that readers should consult. This can be used to correctly find the spec for a module, and therefore the loader. (AddPackagePath and ReplacePackage are not needed, and should be deprecated.) For modules already on sys.path, the loader can be identified from module.__loader__.

If the loader is an InspectLoader, the code can be retrieved and examined to see what is imported. (Remember to check whether None is returned.) If not, we will assume nothing is imported - which is what modulefinder currently does with extension modules.

Since run_script takes a file path rather than a module name, it will have to find the appropriate loader for the task. This will be a simple choice between SourceFileLoader and SourcelessFileLoader, depending on what kind of file is being used.


It is also possible to use functions in importlib._bootstrap instead.
This enhancement would indirectly cause open_code to be used properly, so the bug I originally posted is redundant.
msg356746 - (view) Author: Dominic Littlewood (plokmijnuhby) * Date: 2019-11-16 09:20
Okay, I've encountered a snag. To use meta path finders, if the path is None, it's sometimes okay to substitute self.path, and sometimes not. The only way to make absolutely sure the finder works correctly is to change sys.path. To use namespace packages (and possibly some other types of modules finders might produce), it's also necessary to adjust sys.modules. However, both of these things may cause problems if a hook wants to import something in the standard way so it can run properly. This will require some thought.
msg356908 - (view) Author: Dominic Littlewood (plokmijnuhby) * Date: 2019-11-18 20:37
I now have a PR which - dare I say it - appears to be working as it should. I'll just write some tests for it and then I'll send it off for review.
msg357215 - (view) Author: Brandt Bucher (brandtbucher) * (Python committer) Date: 2019-11-21 20:42
I'm not sure what you mean when you say "modulefinder currently will detect modules imported by a script dynamically by running the script"... modulefinder is completely static, no?

I also think that changing sys.path like this is a bad idea (I tried this in an earlier PR and was told to remove it and rework my solution). We lose thread-safety, and it has effects far outside of the scope of modulefinder (i.e. the hooks we call could cache it or something, and break later imports). If changing sys.path is the only way to accomplish this (still not exactly sure what), I doubt it will be accepted. It seems like this PR does a lot of unnecessary refactoring too, on first glance.

With that said, adding support for hooks should be fairly straightforward by just manually walking down sys.path/meta_path/path_hooks in _find_module... maybe.
msg357216 - (view) Author: Brandt Bucher (brandtbucher) * (Python committer) Date: 2019-11-21 20:44
See prior discussion on this here:

https://github.com/python/cpython/pull/11787#discussion_r256442282
msg357219 - (view) Author: Dominic Littlewood (plokmijnuhby) * Date: 2019-11-21 21:28
You are correct about modulefinder being static. I read "run_script" and thought it was running a script, but it turns out I was being silly.

I shall correct the sys.path issue.
msg357340 - (view) Author: Dominic Littlewood (plokmijnuhby) * Date: 2019-11-22 23:37
In regards to the "unnecessary refactoring", here's a full list of what I've changed.

1. Previously, the module revolved around passing files and file descriptors around. Now, it uses specs. This means all the functions that used file descriptors have to be updated.

2. Due to the previous issue, several functions had to change their call signatures. The functions that called them had to change to react.

3. Packages work differently now. They do not have to have an __init__ submodule any more, so there's no need for a load_package function - it's integrated with load_module.

4. determine_parent was having trouble finding where the relative imports were relative to. I changed it to use _calc___package__, same as the import system. Half of determine_parent was deleted, because I couldn't understand how it worked and eventually worked out that there was literally no way it could ever run, so it didn't matter.

5. Modules with no __path__ attribute are not packages. Modules with a __path__ attribute of None *are* packages. Code had to be changed to reflect that. Looking back, this is possibly a pedantic enough point that these changes could be removed and hopefully nothing will go wrong.

6. fix_sys was used to update sys.path (although I have since changed it so it does not) and sys.modules. Code had to be changed to make use of it.

7. The only other thing I can think of is that I changed which modules were imported in the first few lines of modulefinder, since some had become unnecessary. (In fact types and warnings were already unnecessary, but never mind about that.)
History
Date User Action Args
2022-04-11 14:59:22adminsetgithub: 82902
2019-11-22 23:37:04plokmijnuhbysetmessages: + msg357340
2019-11-21 21:28:22plokmijnuhbysetmessages: + msg357219
2019-11-21 20:44:15brandtbuchersetmessages: + msg357216
2019-11-21 20:42:01brandtbuchersetnosy: + brandtbucher
messages: + msg357215
2019-11-21 19:55:38plokmijnuhbysetkeywords: + patch
stage: patch review
pull_requests: + pull_request16813
2019-11-18 20:37:42plokmijnuhbysetmessages: + msg356908
2019-11-16 09:20:49plokmijnuhbysetmessages: + msg356746
2019-11-13 19:14:18plokmijnuhbysettype: security -> enhancement
messages: + msg356547
title: modulefinder should use io.open_code() instead of open() -> modulefinder should use import hooks properly
2019-11-06 16:05:22plokmijnuhbycreate