This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Pascal.Chambon
Recipients Pascal.Chambon, brett.cannon, eric.snow, ncoghlan
Date 2013-04-15.13:35:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1366032929.16.0.491115950213.issue17716@psf.upfronthosting.co.za>
In-reply-to
Content
(sorry for the long post, but it's a complex issue I guess)

I forgot to precise that I have this behaviour with the latest python2.7, as well as python3.3 (I guess other versions behave the same).

I agree that having side effects in script imports looks dangerous, but on the other hand it's incredibly handy to use the "script" behaviour of module so that each one initializes/checks himself, rather than relying on the calling of initialization methods from somewhere else (many web frameworks don't even plan such setup scripts actually, I have a django ticket running on that subject just at the moment).

Loads of python modules perform such inits (registration of atexit handlers, setup of loggers, of working/temp directories, or even modifying process-level settings.....), so even though we're currently adding protection via exception handlers (and checking the idempotency of our imports, crucial points!), I could not guarantee that none of the modules/packages we use won't have such temporary failures (failures that can't be fixed by the web server, because module trees become forever unimportable).

With the video and the importlib code, I'm beginning to have a better understanding on the from..import, and I noticed that actually both "import mypkg.module_a" and "from mypkg import module_a" get broken when mypkg raised an exception after successfully loading module_a. 
It's just that the second form breaks loudly, whereas the first one remains silently corrupted (i.e the variable mypkg.module_a does NOT exist in both cases, so theer are pending AttributeErrors anyway).

All comes from the fact that - to talk with "importlib/_bootstrap.py" terms - _gcd_import() assumes everything is loaded and bound when a chain of modules (eg. "mypkg.module_a") is in sys.modules, whereas intermediary bindings (setattr(mypkg, "module_a", module_a)) might have been lost due to an import failure (and the removal of the mypkg module).
Hum I wonder, could we just recheck all bindings inside that _gcd_import() ? I guess there would be annoying corner cases with circular imports, i.e we could end up creating these bindings whereas they are just "pending to be done" in parent frames...

Issue 17636 might provide a workaround for some cases, but it doesn't fix the root problem of the "rolled back" import (eg. here the absence of binding between mypkg and module_a, whatever the import form that was used). Imagine a tree "mypkg/mypkg2/module.py", if module.py gets well loaded but mypkg and mypkg2 fail, then later, somewhere else in the code, it seems an "import mypkg.mypkg2.module" will SUCCEED even though the module tree is broken, and AttributeErrors are pending.

I guess Nick was right (and me wrong), the cleanest solution seems to enforce an invariant saying that a submodule can NOT fully be in sys.modules if his parent is not either loaded or "in the process" of loading it (thus if a binding between parent and child is missing, we're simply in the case of circular dependencies). Said another way, the import system should delete all children modules from sys.modules when aborting the import of a parent package. 
What do you think about it ?
History
Date User Action Args
2013-04-15 13:35:29Pascal.Chambonsetrecipients: + Pascal.Chambon, brett.cannon, ncoghlan, eric.snow
2013-04-15 13:35:29Pascal.Chambonsetmessageid: <1366032929.16.0.491115950213.issue17716@psf.upfronthosting.co.za>
2013-04-15 13:35:29Pascal.Chambonlinkissue17716 messages
2013-04-15 13:35:27Pascal.Chamboncreate