Created on 2017-04-09 11:11 by Victor.Varvariuc, last changed 2017-04-23 14:51 by serhiy.storchaka.
|test.zip||Victor.Varvariuc, 2017-04-09 16:36|
|PR 1264||open||serhiy.storchaka, 2017-04-23 14:47|
|msg291376 - (view)||Author: Victor Varvariuc (Victor.Varvariuc)||Date: 2017-04-09 11:11|
https://mail.python.org/pipermail/python-ideas/2017-April/045405.html Hi there. I asked a question <http://stackoverflow.com/questions/41845671/import-as-in-python-3> on Stackoverflow: > (Pdb) import brain.utils.mail > (Pdb) import brain.utils.mail as mail_utils > *** AttributeError: module 'brain.utils' has no attribute 'mail' > > I always thought that import a.b.c as m is roughly equivalent to m = sys.modules['a.b.c']. Why AttributeError? Python 3.6 I was pointed out <http://stackoverflow.com/a/24968941/248296> that this is a somewhat weird behavior of Python: > The statement is not quite true, as evidenced by the corner case you met, namely if the required modules already exist in sys.modules but are yet uninitialized. The import ... as requires that the module foo.bar is injected in foo namespace as the attribute bar, in addition to being in sys.modules, whereas the from ... import ... as looks for foo.bar in sys.modules. Why would `import a.b.c` work when `a.b.c` is not yet fully imported, but `import a.b.c as my_c` would not? I though it would be vice versa. Using `import a.b.c as my_c` allows avoiding a number of circular import issues. Right now I have to use `from a.b import c as my_c` as a workaround, but this doesn't look right. The enhancement is to treat `import x.y.z as m` as `m = importlib.import_module('x.y.z')`. I don't see how this will break anything.
|msg291377 - (view)||Author: Nick Coghlan (ncoghlan) *||Date: 2017-04-09 12:11|
The background here is the change in http://bugs.python.org/issue17636 that allows IMPORT_FROM to fall back to sys.modules when written as "from a.b import c as m", while the plain LOAD_ATTR generated for "import a.b.c as m" fails. One potential resolution to the discrepancy that occurred to me in the python-ideas discussion was to add a new IMPORT_ATTR opcode specifically for use in the latter case. The main downside I can see is the usual one with adding any new opcode for a niche use case like this one: it risks slowing the eval loop down in general for a marginal improvement in language consistency. However, I don't think we should reject the possibility out of hand on that basis.
|msg291380 - (view)||Author: Guido van Rossum (gvanrossum) *||Date: 2017-04-09 15:48|
Hm, the stackoverflow point to another stackoverflow (http://stackoverflow.com/questions/24807434/imports-in-init-py-and-import-as-statement/24968941#24968941) which is due to an import cycle. Is there also an import cycle here? Because I cannot seem repro this with just empty modules (and I'm too lazy to find out if brain is a real package).
|msg291384 - (view)||Author: Victor Varvariuc (Victor.Varvariuc)||Date: 2017-04-09 16:36|
# mytest/mod2.py: import sys import mytest.mod1 assert 'mytest.mod1' in sys.modules import mytest.mod1 as mod # this fails, though the prev. lines work perfectly
|msg291400 - (view)||Author: Guido van Rossum (gvanrossum) *||Date: 2017-04-10 00:32|
So to save others the bother to check what's in the zip file, the contents of mytest/mod1.py is as follows: import mytest.mod2 as mod def func(): print('mod1.func called') mod.func() There's no __init__.py (so mytest is a namespace package, PEP 420) but adding an empty __init__.py makes no difference. The problem occurs with both Python 2 and 3. The root cause is the import cycle. I played around with dis and found the following (I suspect others have already found this but the thread was hard to follow for me initially): >>> dis.dis('import a.b') 1 0 LOAD_CONST 0 (0) 2 LOAD_CONST 1 (None) 4 IMPORT_NAME 0 (a.b) 6 STORE_NAME 1 (a) 8 LOAD_CONST 1 (None) 10 RETURN_VALUE >>> compared to >>> dis.dis('import a.b as c') 1 0 LOAD_CONST 0 (0) 2 LOAD_CONST 1 (None) 4 IMPORT_NAME 0 (a.b) 6 LOAD_ATTR 1 (b) <-- error here 8 STORE_NAME 2 (c) 10 LOAD_CONST 1 (None) 12 RETURN_VALUE >>> What this shows is that the implementation of "import a.b" and "import a.b as c" are different. The former calls __import__('a.b', ...) which returns the module 'a' and stores that in the variable 'a'. In the OP's case, because of the import cycle, while sys.modules['a.b'] exists, module 'a' does not yet have the attribute 'b'. That's the reason that in the latter example, the LOAD_ATTR opcode fails. I'm inclined not to attempt to fix this. The reason that we don't pull 'a.b' out of sys.modules at this point is that if later in the execution of a/b.py we get an exception, the import will be cancelled, and sys.modules['a.b'] will be *deleted*, and then the variable 'c' would have a reference to a half-loaded module. The semantics of imports in the case of cycles are somewhat complex but clearly defined and there are only a few rules to consider, and from these rules it is possible to reason out whether any particular case is valid or not. I would prefer to keep it this way rather than add more special cases. There's a good reason why, *in general* (regardless of import cycles), "import a.b as c" is implemented as a getattr operation on a, not as an index operation on sys.modules (it is possible for module a to override its attribute b without updating sys.modules) and I'd rather keep those semantics than give them up for this particular edge case. Cyclic imports are hard. If they don't work for you, avoid them.
|msg291411 - (view)||Author: Victor Varvariuc (Victor.Varvariuc)||Date: 2017-04-10 05:29|
> I'm inclined not to attempt to fix this. The reason that we don't pull 'a.b' out of sys.modules at this point is that if later in the execution of a/b.py we get an exception, the import will be cancelled, and sys.modules['a.b'] will be *deleted*, and then the variable 'c' would have a reference to a half-loaded module. I think this is solvable -- track and delete the references too. It's your decision in the end, but I think it's not good to make the language inconsistent because of technical limitations -- the interface should be the primary source for decisions.
|msg291414 - (view)||Author: Victor Varvariuc (Victor.Varvariuc)||Date: 2017-04-10 05:58|
> it is possible for module a to override its attribute b without updating sys.modules This sounds like a really strange application. Even if someone overrides the attribute, when you do in other place `import a.b.c as m` you expect `a.b.c` to be path to a module, otherwise it would be hard to debug. The docs https://docs.python.org/3/reference/simple_stmts.html#import say `import_stmt ::= "import" module ["as" name]` meaning everywhere that `module` is a path to a module. https://docs.python.org/3/reference/import.html#searching > To begin the search, Python needs the fully qualified name of the module (or package, but for the purposes of this discussion, the difference is immaterial) being imported. This name may come from various arguments to the import statement, or from the parameters to the importlib.import_module() or __import__() functions. So if you treat `import a.b.c as m` as `import a; m = a.b.c` -- it go go in some cases against the docs.
|msg291416 - (view)||Author: Nick Coghlan (ncoghlan) *||Date: 2017-04-10 06:25|
Right, this problem only arises with import cycles, and that's why we resisted making eager submodule resolution work *at all* for so long (issue 992389 was filed way back in 2004). We only conceded the point (with issue 17636 being implemented for 3.5) specifically to address a barrier to adoption for explicit relative imports, as it turned out that "from . import bar" could fail in cases where "import foo.bar" previously worked. The best explanation I could find for that rationale in the related python-dev thread is PJE's post here: https://mail.python.org/pipermail/python-dev/2013-April/125121.html What Victor's python-ideas thread pointed out is that there are actually *3* flavours of import where this particular circular reference problem can come up: # Has worked as long as Python has had packages, # as long as you only lazily resolve foo.bar in # function and method implementations import foo.bar # Has worked since 3.5 due to the IMPORT_FROM # change that falls back to a sys.modules lookup from foo import bar # Still gives AttributeError since it # eagerly resolves the attribute lookup import foo.bar as bar While I think the architectural case for allowing this kind of circular dependency between different top level namespaces is *much* weaker than that for allowing it within packages, I do think there's a reasonable consistency argument to be made in favour of ensuring that `from foo import bar` and `import foo.bar as bar` are functionally equivalent when `bar` is a submodule of `foo`, especially since the latter form makes it clearer to the reader that `bar` *is* a submodule, rather than any arbitrary attribute. I don't think it's a big problem in practice (so I wouldn't spend any time on implementing it myself), but the notion of an IMPORT_ATTR opcode for the "import x.y.z as m" case that parallels IMPORT_FROM seems architecturally clean to me in a way that the proposed resolutions to issue 992389 weren't.
|msg291431 - (view)||Author: Guido van Rossum (gvanrossum) *||Date: 2017-04-10 17:16|
OK I won't close it but I'm withdrawing from the issue. If there are devs who want to implement this or mentor some contributor into implementing it feel free, but it sounds like Nick doesn't think it's important enough to fix (i.e. not worth his time) and I feel the same.
|msg291437 - (view)||Author: Serhiy Storchaka (serhiy.storchaka) *||Date: 2017-04-10 17:59|
It is easy to implement this (just replace LOAD_ATTR with IMPORT_FROM in the compiler). The hardest part is writing tests.
|msg291486 - (view)||Author: Nick Coghlan (ncoghlan) *||Date: 2017-04-11 12:09|
Huh, interesting - I'd missed that the only part of the "from a.b import c" that IMPORT_FROM implements is the LOAD_ATTR variant that falls back to sys.modules. The prior adjustment to get IMPORT_NAME to return "a.b" instead of "a" happens inside that opcode based on the fact that a non-empty from_list was passed in. So indeed, there's no new opcode needed - as Serhiy points out, the compiler just needs to emit IMPORT_FROM instead of LOAD_ATTR for this case.
|msg291926 - (view)||Author: Pavol Lisy (Pavol Lisy)||Date: 2017-04-20 00:41|
Maybe I am wrong but don't we get another weird behavior? import sys.path # this is error now ModuleNotFoundError: No module named 'sys.path'; 'sys' is not a package So we probably expect same behavior here: import sys.path as foo But if we just replace LOAD_ATTR with IMPORT_FROM in the compiler then wouldn't it works like: from sys import path as foo ?
|msg291947 - (view)||Author: Serhiy Storchaka (serhiy.storchaka) *||Date: 2017-04-20 05:02|
No, because the compiler generates different arguments for the IMPORT_NAME opcode.
|msg292111 - (view)||Author: Nick Coghlan (ncoghlan) *||Date: 2017-04-22 07:52|
Expanding on Serhiy's answer: the ModuleNotFoundError for "import sys.path" comes from IMPORT_NAME, not from any of the subsequent attribute lookups. That difference is visible in the bytecode: >>> dis("import sys.path as path") 1 0 LOAD_CONST 0 (0) 3 LOAD_CONST 1 (None) 6 IMPORT_NAME 0 (sys.path) 9 LOAD_ATTR 1 (path) 12 STORE_NAME 1 (path) 15 LOAD_CONST 1 (None) 18 RETURN_VALUE For "import sys.path as path", the given module name is "sys.path", and the "from list" entry on the stack is None. This fails before it even reaches the `LOAD_ATTR` line, since "sys.path" isn't an importable module. Changing LOAD_ATTR to IMPORT_FROM would thus have no effect on its behaviour. >>> dis("from sys import path") 1 0 LOAD_CONST 0 (0) 3 LOAD_CONST 1 (('path',)) 6 IMPORT_NAME 0 (sys) 9 IMPORT_FROM 1 (path) 12 STORE_NAME 1 (path) 15 POP_TOP 16 LOAD_CONST 2 (None) 19 RETURN_VALUE For "from sys import path", the given module name is "sys", and the "from list" entry on the stack is a tuple containing the string "path". This works, since "sys" is importable *and* it has a "path" attribute. Hence why Serhiy's suggestion is such an elegant fix, since the only cases where the LOAD_ATTR vs IMPORT_FROM distinction matters will be those where: 1. The submodule exists, so both "import a.b" and "from a import b" will work for the IMPORT_NAME step 2. It's a circular import, so even though "a.b" exists in sys.modules, it won't be accessible as an attribute of "a" yet Getting the tests and any related documentation updates right will actually be harder than the code change.
|msg292119 - (view)||Author: Victor Varvariuc (Victor.Varvariuc)||Date: 2017-04-22 13:03|
How can I facilitate/contribute to get the changes suggested by Serhiy into Python?
|msg292167 - (view)||Author: Serhiy Storchaka (serhiy.storchaka) *||Date: 2017-04-23 14:51|
PR 1264 implements this idea. Seems this is a duplicate of issue23203 which contains the same idea.
stage: needs patch -> patch review
|2017-04-23 14:47:00||serhiy.storchaka||set||pull_requests: + pull_request1377|
|2017-04-22 13:03:01||Victor.Varvariuc||set||messages: + msg292119|
|2017-04-22 07:52:42||ncoghlan||set||messages: + msg292111|
|2017-04-20 05:02:06||serhiy.storchaka||set||messages: + msg291947|
|2017-04-20 00:41:32||Pavol Lisy||set||nosy:
+ Pavol Lisy|
messages: + msg291926
|2017-04-11 12:09:13||ncoghlan||set||messages: + msg291486|
messages: + msg291437
keywords: + easy (C)
stage: needs patch
|2017-04-10 17:16:06||gvanrossum||set||messages: + msg291431|
|2017-04-10 06:25:26||ncoghlan||set||messages: + msg291416|
|2017-04-10 05:58:03||Victor.Varvariuc||set||messages: + msg291414|
|2017-04-10 05:29:39||Victor.Varvariuc||set||messages: + msg291411|
|2017-04-10 00:32:43||gvanrossum||set||messages: + msg291400|
messages: + msg291384
messages: + msg291380
+ brett.cannon, ncoghlan, eric.snow|
messages: + msg291377