classification
Title: From ... import fails when parent package failed but child module succeeded, yet works in std import case
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: 17636 Superseder:
Assigned To: Nosy List: Pascal.Chambon, brett.cannon, eric.snow, ncoghlan
Priority: normal Keywords:

Created on 2013-04-13 16:04 by Pascal.Chambon, last changed 2020-03-06 20:14 by brett.cannon. This issue is now closed.

Files
File name Uploaded Description Edit
ImportFailPy2.zip Pascal.Chambon, 2013-04-13 16:04
Messages (10)
msg186738 - (view) Author: Pascal Chambon (Pascal.Chambon) Date: 2013-04-13 16:04
Hello,

we've encountered several times a very nasty bug on our framework, several times tests or even production code (served by mod_wsgi) ended up in a broken state, where imports like "from . import processing_exceptions", which were NOT in circular imports and were 100% existing submodules, raised exceptions like "ImportError: cannot import name processing_exceptions". Restarting the test/server fixed it, and we never knew what happened.

I've crossed several forum threads on similar issues, only recently did I find one which gave a way to reproduce the bug:
http://stackoverflow.com/questions/12830901/why-does-import-error-change-to-cannot-import-name-on-the-second-import

So here attached is a python2 sample (python3 has the same pb), showing the bug (just run their test_import.py)

What happens here, is that a package "mypkg" fails to get imported due to an exception (eg. temporarily failuure of DB), but only AFTER successfully importing a submodule mypkg.module_a.
Thus, "mypkg.module_a" IS loaded and stays in sys.modules, but "mypkg" is erased from sys.modules (like the doc on python imports describes it).

The next time we try, from within the same application, to import "mypkg", and we cross "from mypkg import module_a" in the mypkg's __init__.py code, it SEEMS that the import system checks sys.modules, and seeing "mypkg.module_a" in it, it THINKS that necessarily mypkg is already initialized and contains a name "module_a" in its global namespace. Thus the "cannot import name processing_exceptions" error.

Importing "module_a" as an absolute or relative import changes nothing, however doing "import mypkg.module_a" solves the problem (dunno why).

Another workaround is to cleanup sys.modules in mypkg/__init__.py, to ensure that a previously failed attempt at importing the package modules doesn't hinder us.
    
    # on top of "mypkg/__init__.py"
    exceeding_modules = [k for k in sys.modules.keys() if k.startswith("mypkg.")]
    for k in exceeding_modules:
        del sys.modules[k]
        
Anyway, I don't know enough python's import internals to understand why, exactly, on second import attempt, the system tries a kind of faulty getattr(mypkg, "module_a"), instead of simply returning sys.modules["mypkg.module_a"] which exists.
Could anyone help with that ? 
That's a very damaging issue, imo, since webserver workers can reach a completely broken state because of that.

PS: more generally, I guess python users lack insight on the behaviour of "from xxx import yyy", especially when yyy is both a real submodule of xxx and a variable initialized in xxx/__init__.py (it seems the real module overrides the variable), or when the __all__ list of xxx could prevent the import of a submodule of xxx by not including it.
Provided I better understand the workflow of all these stuffs - that have quite moved recently I heard - I'd be willing to summarize it for the python docs.
msg186760 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-04-13 17:00
TL;DR: don't do anything involving side-effects as part of an import that can fail like connecting to a database. Another bug that proposes tweaking the IMPORT_FROM bytecode semantics will more than likely solve your problem regardless.


First question: what versions of Python did you test this against? You didn't specify the version when you filed the bug.

Second, if you want to see how it all works, import is implemented in pure Python in Python 3.3 so you can actually read the code if you want: http://hg.python.org/cpython/file/185ae0c95e5b/Lib/importlib/_bootstrap.py. I also gave a talk at PyCon US 2013 (and PyCon Argentina) on how import works if you are curious: http://pyvideo.org/video/1707/how-import-works.

Third, performing operations as a side-effect of import which could cause a failure is a dodgy practice, as you have found out. I would advise looking for a way to not trying to connect to your database as part of an import so that the import will succeed consistently and you won't have this problem.

Fourth, everything is working as intended. We can't roll back imports because of possible side-effects you do during import, so we can't have pkg.module_a successfully import but then toss it aside just because pkg succeeds. And the from ... import format fails as it does on the second import because as you pointed out, pkg.module_a is not an attribute on pkg. The attempted import succeeds internally since pkg.module_a is just sitting there in sys.modules, but it isn't explicitly re-attached to pkg as that's not expected if the first import failed thanks to pkg failing but you choose to attempt to ignore that fact (so it should honestly fail). The bytecode uses getattr() on the module so there is no specific reason for it to think it should work. But when you do a straight import that succeeds as that will do the necessary binding to pkg.

Fifth, http://bugs.python.org/issue17636 will probably make all of this a moot point. =) That issue is tracking a proposed change which will cause from ... import to pull from sys.modules if a module is in sys.modules but not attached on a module to deal with circular imports. But I think as a side-effect it will also help deal with your problem based on what you have described.

Sixth, if you want to help clarify the docs for Python 3.3 and 2.7 that would be great! Just open a new bug with a patch for the doc changes and I would be happy to review it.
msg186890 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-04-14 08:21
The interpreter level problem covered by the current issue is that the difference between "import mypkg.module_a" and "from mypkg import module_a" is masking the fact that it is the original "import mypkg" that failed, and may still fail on the second and subsequent imports. That is, the user level code is at best dubious, but the interpreter is spitting out misleading error messages that obscure the real cause of the failure.

In the tests for #17636, we should ensure that all of the following consistently raise RuntimeError when the import fails (on both the initial and subsequent import attempts):

    import badpkg.goodmod
    raise RuntimeError("Always fails")

    from badpkg import goodmod
    raise RuntimeError("Always fails")

    from . import goodmod
    raise RuntimeError("Always fails")

Where that is the __init__.py code in the following layout:

    badpkg/
        __init__.py
        goodmod.py
msg186892 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-04-14 08:29
More generally, I think we may have to revisit the question of what we remove from sys.modules on failure if, as a side effect of the import, a child module was imported successfully.

In this situation, the possibilities are:

1. Remove the parent module, and all child modules. We don't currently do this because importing the child modules may have had side effects. However, I'm not sure this reasoning is sound, as the section of __init__.py before the failure may have had side effects too, and we don't let that stop us from removing the parent module.

2. Remove just the parent module. That's what we currently do, and it's a problem because we're knowingly breaking one of the import state invariants (i.e. if a non top-level module is present in sys.modules, then all parent modules in the chain can be assumed to also be in sys.modules)

3. Leave the partially initialised parent module in sys.modules as well. This would be a bad idea, since it would lead to very inconsistent behaviour as to whether or not the parent module was left in sys.modules based on the contents of __init__.py

To be honest, I think purging the entire subtree (option 1 above) would be better than what we do now - breaking state invariants is bad, because it can lead to surprising errors a long way from the actual source of the problem.
msg186947 - (view) Author: Pascal Chambon (Pascal.Chambon) Date: 2013-04-14 19:37
Thanks for the feedback, I'm gonna read those docs and related issues asap, and check that planned evolutions will actually fix this.

just as a side note in the meantime: I dont think that the problem here is the "purge " of sys.modules, the failure is actually located in the semantic difference between the two forms of import statements, that should basically behave the same but do not (hance the interest of these related issues noted above).
msg186986 - (view) Author: Pascal Chambon (Pascal.Chambon) Date: 2013-04-15 13:35
(sorry for the long post, but it's a complex issue I guess)

I forgot to precise that I have this behaviour with the latest python2.7, as well as python3.3 (I guess other versions behave the same).

I agree that having side effects in script imports looks dangerous, but on the other hand it's incredibly handy to use the "script" behaviour of module so that each one initializes/checks himself, rather than relying on the calling of initialization methods from somewhere else (many web frameworks don't even plan such setup scripts actually, I have a django ticket running on that subject just at the moment).

Loads of python modules perform such inits (registration of atexit handlers, setup of loggers, of working/temp directories, or even modifying process-level settings.....), so even though we're currently adding protection via exception handlers (and checking the idempotency of our imports, crucial points!), I could not guarantee that none of the modules/packages we use won't have such temporary failures (failures that can't be fixed by the web server, because module trees become forever unimportable).

With the video and the importlib code, I'm beginning to have a better understanding on the from..import, and I noticed that actually both "import mypkg.module_a" and "from mypkg import module_a" get broken when mypkg raised an exception after successfully loading module_a. 
It's just that the second form breaks loudly, whereas the first one remains silently corrupted (i.e the variable mypkg.module_a does NOT exist in both cases, so theer are pending AttributeErrors anyway).

All comes from the fact that - to talk with "importlib/_bootstrap.py" terms - _gcd_import() assumes everything is loaded and bound when a chain of modules (eg. "mypkg.module_a") is in sys.modules, whereas intermediary bindings (setattr(mypkg, "module_a", module_a)) might have been lost due to an import failure (and the removal of the mypkg module).
Hum I wonder, could we just recheck all bindings inside that _gcd_import() ? I guess there would be annoying corner cases with circular imports, i.e we could end up creating these bindings whereas they are just "pending to be done" in parent frames...

Issue 17636 might provide a workaround for some cases, but it doesn't fix the root problem of the "rolled back" import (eg. here the absence of binding between mypkg and module_a, whatever the import form that was used). Imagine a tree "mypkg/mypkg2/module.py", if module.py gets well loaded but mypkg and mypkg2 fail, then later, somewhere else in the code, it seems an "import mypkg.mypkg2.module" will SUCCEED even though the module tree is broken, and AttributeErrors are pending.

I guess Nick was right (and me wrong), the cleanest solution seems to enforce an invariant saying that a submodule can NOT fully be in sys.modules if his parent is not either loaded or "in the process" of loading it (thus if a binding between parent and child is missing, we're simply in the case of circular dependencies). Said another way, the import system should delete all children modules from sys.modules when aborting the import of a parent package. 
What do you think about it ?
msg187025 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-04-15 20:05
Have to think about the whole rollback situation in terms of a failure to import a parent. Whenever you want to change the semantics of import you will break someone's code, it's just a question of how wide the breakage would be and how much of an improvement it will lead to. Probably cheapest way to tell is to implement it and see if the stdlib's test suite still passes.

In terms of implementation, I guess you would need to check if the import failed on a package (if possible), and then do a prefix search through all the keys in sys.modules to see if any children made it through, and then drop them and mention their import failure as well (probably through some exception chaining). It obviously makes failed exceptions more expensive, but I don't think people treat them as a cheap form of EAFP like other uses of exceptions.
msg189765 - (view) Author: Pascal Chambon (Pascal.Chambon) Date: 2013-05-21 16:01
Well, since it's a tough decision to make (erasing all children modules when rolling back parent, or instead reconnecting with children on 2nd import of parent), I guess it should be discussed on python-dev first, shouldn't it ?
msg189799 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-05-22 08:17
import-sig is probably a better place to start
msg363540 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2020-03-06 20:14
Import now has proper messaging of the failure in the package and not an error about being unable to import the submodule.
History
Date User Action Args
2020-03-06 20:14:11brett.cannonsetstatus: open -> closed
resolution: fixed
messages: + msg363540

stage: resolved
2013-05-22 08:17:21ncoghlansetmessages: + msg189799
2013-05-21 16:01:03Pascal.Chambonsetmessages: + msg189765
2013-04-15 20:05:49brett.cannonsetmessages: + msg187025
2013-04-15 13:35:29Pascal.Chambonsetmessages: + msg186986
2013-04-14 19:37:13Pascal.Chambonsetmessages: + msg186947
2013-04-14 08:29:37ncoghlansetmessages: + msg186892
versions: - Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.5
2013-04-14 08:21:10ncoghlansetmessages: + msg186890
2013-04-13 17:00:57brett.cannonsetdependencies: + Modify IMPORT_FROM to fallback on sys.modules
messages: + msg186760
title: IMPORTANT - Process corruption on partly failed imports -> From ... import fails when parent package failed but child module succeeded, yet works in std import case
2013-04-13 16:09:50pitrousetnosy: + brett.cannon, ncoghlan, eric.snow
2013-04-13 16:04:56Pascal.Chamboncreate