New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace __import__ w/ importlib.__import__ #46630
Comments
Python/import.c should be replaced by the implementation under |
OK, so first step is to simply replace __import__ w/ importlib.__import__ using builtins.__import__ in order to make sure that all tests pass as expected. Can probably do this by doing the switch in Py_Initialize() somewhere. Next step after that will be seeing if _io can be used by importlib w/ the import of os postponed until after importlib is bootstrapped. |
Just some notes on the branch: The FAILING file contains tests known to fail. There is a comment explaining the failure and possible fixes. If something makes it here and stays for a while it probably means upstream changes are needed (eg. test_pydoc needs ImportError to grow an attribute as to what import failed). Python/pythonrun.c contains XXX markers of what is left to be done (although currently the zipimport comment is out-of-date and there needs to be one for exposing some APIs created in importlib.__init__). If you are looking for something to do, just search for XXX and try to tackle one of the comments. |
AFAICT, bpo-1559549 is the ImportError attribute ticket. |
OK, I'm down to a single bug to be solved to call this work a "success" (there are other test failures, but they are not overtly difficult to solve). At this point the bootstrapping is failing in the face of sub-interpreters. Specifically, when I try to load the frozen importlib code in Py_NewInterpreter() it leads to an assertion failure in the GC code when handling references in a GC generation. I have zero experience with sub-interpreters and my GC experience is rusty, so if anyone can have a look at the code I would appreciate it to see if they can figure out why loading a frozen module (while in marshal) is leading to a GC assertion error. |
Thanks to Benjamin, the test_capi assert failure is fixed. At this point the only failures are listed in the FAILING file and are (probably) minor. |
Is there a benchmark for import? How slow is importlib? :) |
On Mon, Feb 6, 2012 at 17:06, STINNER Victor <report@bugs.python.org> wrote:
importlib.test.benchmark |
bench_startup.py: short script to compute the best startup time (I wrote the original script for the hash collision issue, bpo-13703). Result on my PC:
So importlib adds an overhead of 25.7% in startup time. |
Just a quick update. I have refactored importlib in the cpython repo to allow for implementing bits of importlib.__import__() and importlib._gcd_import() in C. This means that the built-in __import__() is now calling importlib underneath the covers. Eventually what is in importlib.__import__() and _gcd_import() directly will be translated into the equivalent C code. This will speed up accessing sys.modules. After that whatever is deemed on the critical path and worth rewriting in C can be done function by function. That, though, should wait for Python-level profiling and optimization before wasting one's time. |
Add Nick since the refactoring of importlib.__import__() into functions was his idea. |
So refactoring the Python code into C code has been done, but it's crashing. =) As usual, manual memory management sucks. I also think that their is still too much C code as it makes the whole thing somewhat brittle to any refactoring of importlib. I am seriously thinking of tossing the C code I have written and writing in C only the bare minimum needed to get to the sys.modules check, and otherwise punting to importlib for everything else in a single call or two. So that would mean performing the sanity check on the arguments, calculating the absolute name of the module, grabbing the import lock, checking sys.modules, failing that going to importlib, and then figuring out what to return regardless of where the module came from thanks to fromlist. So about four functions calls. Compare that to the 8 I'm making now along with the need to muck with other things and you can see why starting from scratch where I only care about the sys.modules fast path starts to look mighty attractive. |
Makes sense to me (your new proposal sounds pretty close to the sketch I posted on the mailing list). It's really only the performance of in-function imports that concerned me and those are almost always going to get hits in sys.modules. |
A few thoughts from my outsider viewpoint:
|
On Wed, Feb 22, 2012 at 19:35, Éric Araujo <report@bugs.python.org> wrote: I have already done that and pushed the initial change. The ultimate goal I also have some failing tests, but that all has to do with Antoine's Ha! I wish. I have to get within reasonable striking distance of current It's possible as long as it is exposed through imp.
=) Thanks |
That last comment from me looks odd because somewhere along the way my replies to Éric's comment got stripped out. First two paragraphs are in response to 1) Rest makes sense since the inline reply survived. |
OK, so I have now streamlined the case so that only C is called when a module is in sys.modules, level == 0 and not fromlist. Using the normal_startup benchmark, bootstrapped importlib is 1.12x slower. A fast run of startup_nosite puts it at 1.2x slower. Using importlib.test.benchmark, the results are: Comparing new vs. old sys.modules : 331,097 vs. 383,180 (86.407694%) Where does that leave us? Well, obviously on medium-sized modules and up, the I/O involved along with parsing and execution outweighs importlib's costs to the point of being negligible. sys.modules is also now fast enough (I don't care what people say; being able to import from sys.modules at 0.0026 ms vs. 0.003 ms is not important in the grand scheme of things). Built-in modules and such could be faster, but (a) there are only so many and they end up in sys.modules quickly anyway, and (b) even at their current speed they are still fast. So the real hold-up is small modules and whether they matter. The tabnanny benchmark is the median size of the stdlib, so half of modules will see no slowdown while half will see something of some level. The most critical thing is to not use _gcd_import() in situations where the import function is currently passed in but instead use builtins.__import__() so as to get any C speed-ups (which I measured to make sure there are in all cases). After that I think unrolling the mentioned functions would help speed things up, but otherwise it becomes time to profile the Python code to see where the inefficiencies lie. Some more C unrolling could be done. The situation of not fromlist could be written in C. Both _calc___packages__() and _resolve_name() could probably be written in C if someone cared. |
So I finished rewriting _return_module() so that if fromlist is not defined then it's all in C (and subsequently naming the function _handle_fromlist()). That leaves rewriting in C _calc___package__() and _resolve_name(). After that I don't think there is anything left to do in C for __import__() itself (beyond maybe some reorganizing to avoid stupid things). After that the only thing I can think of writing in C would be BuiltinImporter (and FrozenImporter if someone truly cared). And once all that is said and done, that only leaves optimizing the Python code to try to appease the performance gods. |
OK, I have now done as much C code as I'm going to do for the __import__() function. It has gotten bootstrapped importlib within 10% of normal_startup against default. That leaves (possibly) rewriting BuiltinImporter in C and then just good old fashioned optimizations of the Python code. I'm going to continue to use the normal_startup benchmark as my gold standard benchmark and importlib.test.benchmark just for iterative testing that a change made an impact (until there is another Python 3 benchmark that measures startup time using a real-world app). With that in mind and taking a PyPy style approach, that means I should focus on top-level, non-package imports first (28 of them), followed by builtins (14), submodules (4), packages (2), and then extensions (2). And if you don't want to do the math, there are 50 imports when starting Python, of which only 9 seem to follow (and include) site. |
Are you looking for reviews at this point? |
For the record, compilation fails here: Python/import.c: In function ‘PyImport_ImportModuleLevelObject’: |
Fixed the 'for' loop variable declaration. Surprised clang didn't warn me about that. As for reviews, I'm totally happy to get them, but I also don't know if I have hit the performance point well enough for people to allow me to merge the code into default. I guess the real question is whether *you're* happy with the 10% slowdown in raw startup, Antoine, since you were the most vocal critic from a performance standpoint? |
Well, it seems current performance is acceptable now :) |
Then if you want to give it a review that would be great! I still need to solve the test_pydoc failure (either with Brian's patch to add a name attribute to ImportError or implement importlib.find_module()). Otherwise all other failures at the moment are because of mtime race conditions (as you know =) or exist in the default branch. And I still need to integrate rebuilding importlib.h into the Makefile, but that should be easy since it will probably be a separate command so you don't accidentally break import by busting importlib. |
I have replied to Antoine's review and so generated a new patch. At this point my bootstrap_importlib branch is 5% slower in a standard build in the normal_startup benchmark (11% if you use a debug build). This is still w/o profiling the Python code to look for inefficiencies (of which I'm sure there are some considering how long I have been banging away at this code). |
I think that's fine. |
Might or might not. You should try, there's a fallback stderr at
I don't think I have understood anything :) It probably doesn't help, |
So after looking at import.c and how it handles stuff, I have decided how I already implemented __import__() triggering __import__() itself is fine. So Antoine can just stay confused by my comment as it is now a moot point. =) |
Looks like Windows as a PC/import_nt.c:_PyWin_FindRegisteredModule() function used to find modules in the registry. Probably should care about that at some point (I guess). |
I'm guessing the most important part to review is the binding of importlib into being the main import implementation? (It's been in the tree a while, so I assume the kinks of the actual import implementation have been mostly ironed out.) |
Yes, that's the important bit. Thomas has given it a once over at this point so you should be able to see his review comments as well to prevent duplicate work. |
Okay; I added review. |
Thanks to everyone for the reviews. I'm moving house tomorrow and I suspect Andrea wants me to take a little Python break so I might not get to the reviews before a2, but I will definitely get to them an merged by a3. |
There are also reference leaks: $ ./python -m test -R 3:2 test_bz2
[1/1] test_bz2
beginning 5 repetitions
12345
.....
test_bz2 leaked [-1, -1] references, sum=-2
$ ./python -m test -R 3:2 test_hashlib
[1/1] test_hashlib
beginning 5 repetitions
12345
.....
test_hashlib leaked [-1, -1] references, sum=-2
1 test failed:
test_hashlib
$ ./python -m test -R 3:2 test_capi
[1/1] test_capi
beginning 5 repetitions
12345
.....
test_capi leaked [78, 78] references, sum=156
1 test failed:
test_capi |
OK, everyone's code review comments have been addressed. That means I have test_pydoc still failing (and that won't get fixed until ImportError grows a module_name attribute; issue bpo-1559549) and test_trace (it doesn't like frozen modules). I also have not plugged the memory leaks that Antoine pointed out (but then again I'm not sure where they might be considering how many people have gone over this code and not spotted a leak that I have not fixed). I also have not dealt with python -v or Windows registry imports. But once everything but the memory leaks are done I will generate a massive patch and commit to default. |
OK, -v/PYTHONVERBOSE is as done as it is going to be by me. Next up is (attempting) Windows registry stuff. After that I will push to default with test_trace and test_pydoc skipped so others can help me with those. |
Skipped? How so? I think it would be better if you tried to debug them. |
On Fri, Apr 6, 2012 at 16:05, Antoine Pitrou <report@bugs.python.org> wrote:
By raising unittest.SkipTest. I already know how to fix pydoc, but I need to get module names attached to |
OK, I have fixed test_trace by tweaking the trace module in default. I have decided to follow Antoine's suggestion-by-question and get test_pydoc working before I merge by getting ImportError spruced up in default but pydoc changed in bootstrap_importlib. I am *not* going to try to get registry-based imports working because I don't have a Windows machine and could quite easily break the build. I would rather have a Windows expert add the support. IOW:
|
New changeset 2dd046be2c88 by Brett Cannon in branch 'default': |
While the code has been committed, I'm leaving the issue open until I have checked that the language spec is up-to-date and I have written the "What's New" entry. I am holding off on both, though, unti any tweaks I make to the import process is in for Python 3.3. |
Notes on what to mention: importlib.invalidate_caches()
doctests and ImportError now spitting out the full module name
ImportError's new attributes |
More notes: 5% startup loss according to normal_startup; within realm of compiler optimizations. |
Brett, your latest commit breaks IDLE. Here's the error message: Failed to import extension: FormatParagraph
Failed to load extension 'FormatParagraph'
Traceback (most recent call last):
File "./idlelib/EditorWindow.py", line 998, in load_standard_extensions
self.load_extension(name)
File "./idlelib/EditorWindow.py", line 1008, in load_extension
mod = __import__(name, globals(), locals(), [])
File "<frozen importlib._bootstrap>", line 974, in _find_and_load
ImportError: No module named 'FormatParagraph' Same error occurs for ZoomHeight, ScriptBinding, CallTips, ParenMatch, and AutoComplete. I reverted to e730da0cd489, recompiled, and these errors went away. Just to be sure, I updated to tip, recompiled, and the errors reappeared. |
Another thing to note: index does not default to -1 anymore but to 0; bug that should have gone away in Python 2.7. |
So IDLE broke because it was relying on buggy behaviour accidentally left in Python 2.7 and carried forward (plus it was not updated to use best practices like importlib.import_module()). Roger, can you try the patch I have uploaded and see if that fixes things? |
I tested update_idle.diff and it corrects the issue. While IDLE's use of __import__ may be "buggy", I don't see anything in the documentation about deprecation or other warnings about this usage. This is a backwards-incompatible change. |
I caused a segmentation fault with the following (on Linux): $ mkdir crash
$ touch crash/mod.py
$ echo "__import__('mod', globals(), locals(), [], 1)" > crash/__init__.py
$ ./python3 -m crash |
I committed the fix. Thanks for testing, Roger. As for the change in semantics, I'm fully aware it is not backwards-compatible. Unfortunately the incorrect usage was not even discovered until I started my bootstrap work because the import statement does the right thing but __import__() itself was never updated so it is only noticeable if you never updated your code to use importlib.import_module() which has been the preferred way to programmatically import code since Python 2.7/3.1. Plus the correct semantics are documented in PEP-328 (http://python.org/dev/peps/pep-0328/) and referenced in the language spec (http://docs.python.org/py3k/reference/simple_stmts.html#the-import-statement) so I'm not going to change it back since this brings the function more in line with expectations. And since the fix is as simple as a try/except and two import calls then it falls within the realm of having to fix code for any new Python release. And as for the crash, I will have a look. |
Brett, I see your point. The docs for __import__ should be updated to include your two-import fix as well as reference PEP-328. |
OK, crasher is fixed (as is importlib as it failed on the test case as well thanks to the slicing of [:-0] returning the empty string instead of the entire string). And I will update the docs to be a bit more clear about things (at least those docs have the right default value for level). |
Note: __import__('sys', level=1) no longer works; raises KeyError instead. |
To note: _frozen_importlib is a CPython implementation detail |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: