Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat import a.b.c as m as m = sys.modules['a.b.c'] #74210

Closed
VictorVarvariuc mannequin opened this issue Apr 9, 2017 · 21 comments
Closed

Treat import a.b.c as m as m = sys.modules['a.b.c'] #74210

VictorVarvariuc mannequin opened this issue Apr 9, 2017 · 21 comments
Assignees
Labels
3.7 (EOL) end of life easy interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@VictorVarvariuc
Copy link
Mannequin

VictorVarvariuc mannequin commented Apr 9, 2017

BPO 30024
Nosy @warsaw, @brettcannon, @ncoghlan, @ericsnowcurrently, @serhiy-storchaka, @zhangyangyu
PRs
  • bpo-30024: Circular imports involving absolute imports with binding #1264
  • bpo-31286: Fixed stack usage in absolute imports with binding a submo… #3217
  • Files
  • test.zip
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2017-05-09.19:32:51.174>
    created_at = <Date 2017-04-09.11:11:54.526>
    labels = ['interpreter-core', 'easy', 'type-feature', '3.7']
    title = "Treat `import a.b.c as m` as `m = sys.modules['a.b.c']`"
    updated_at = <Date 2017-08-29.12:47:46.539>
    user = 'https://bugs.python.org/VictorVarvariuc'

    bugs.python.org fields:

    activity = <Date 2017-08-29.12:47:46.539>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2017-05-09.19:32:51.174>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2017-04-09.11:11:54.526>
    creator = 'Victor.Varvariuc'
    dependencies = []
    files = ['46794']
    hgrepos = []
    issue_num = 30024
    keywords = ['easy (C)']
    message_count = 21.0
    messages = ['291376', '291377', '291380', '291384', '291400', '291411', '291414', '291416', '291431', '291437', '291486', '291926', '291947', '292111', '292119', '292167', '293329', '293336', '293338', '293342', '300983']
    nosy_count = 9.0
    nosy_names = ['barry', 'brett.cannon', 'ncoghlan', 'THRlWiTi', 'eric.snow', 'serhiy.storchaka', 'Victor.Varvariuc', 'xiang.zhang', 'Pavol Lisy']
    pr_nums = ['1264', '3217']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue30024'
    versions = ['Python 3.7']

    @VictorVarvariuc
    Copy link
    Mannequin Author

    VictorVarvariuc mannequin commented Apr 9, 2017

    https://mail.python.org/pipermail/python-ideas/2017-April/045405.html

    Hi there.

    I asked a question <http://stackoverflow.com/questions/41845671/import-as-in-python-3\> on Stackoverflow:

    (Pdb) import brain.utils.mail
    (Pdb) import brain.utils.mail as mail_utils
    *** AttributeError: module 'brain.utils' has no attribute 'mail'

    I always thought that import a.b.c as m is roughly equivalent to m = sys.modules['a.b.c']. Why AttributeError? Python 3.6

    I was pointed out <http://stackoverflow.com/a/24968941/248296\> that this is a somewhat weird behavior of Python:

    The statement is not quite true, as evidenced by the corner case you met, namely if the required modules already exist in sys.modules but are yet uninitialized. The import ... as requires that the module foo.bar is injected in foo namespace as the attribute bar, in addition to being in sys.modules, whereas the from ... import ... as looks for foo.bar in sys.modules.

    Why would import a.b.c work when a.b.c is not yet fully imported, but import a.b.c as my_c would not? I though it would be vice versa.

    Using import a.b.c as my_c allows avoiding a number of circular import issues. Right now I have to use from a.b import c as my_c as a workaround, but this doesn't look right.

    The enhancement is to treat import x.y.z as m as m = importlib.import_module('x.y.z'). I don't see how this will break anything.

    @VictorVarvariuc VictorVarvariuc mannequin added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Apr 9, 2017
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Apr 9, 2017

    The background here is the change in http://bugs.python.org/issue17636 that allows IMPORT_FROM to fall back to sys.modules when written as "from a.b import c as m", while the plain LOAD_ATTR generated for "import a.b.c as m" fails.

    One potential resolution to the discrepancy that occurred to me in the python-ideas discussion was to add a new IMPORT_ATTR opcode specifically for use in the latter case.

    The main downside I can see is the usual one with adding any new opcode for a niche use case like this one: it risks slowing the eval loop down in general for a marginal improvement in language consistency.

    However, I don't think we should reject the possibility out of hand on that basis.

    @gvanrossum
    Copy link
    Member

    Hm, the stackoverflow point to another stackoverflow (http://stackoverflow.com/questions/24807434/imports-in-init-py-and-import-as-statement/24968941#24968941) which is due to an import cycle. Is there also an import cycle here? Because I cannot seem repro this with just empty modules (and I'm too lazy to find out if brain is a real package).

    @VictorVarvariuc
    Copy link
    Mannequin Author

    VictorVarvariuc mannequin commented Apr 9, 2017

    # mytest/mod2.py:

    import sys
    
    import mytest.mod1
    assert 'mytest.mod1' in sys.modules
    import mytest.mod1 as mod  # this fails, though the prev. lines work perfectly

    @gvanrossum
    Copy link
    Member

    So to save others the bother to check what's in the zip file, the contents of mytest/mod1.py is as follows:

    import mytest.mod2 as mod
    
    
    def func():
        print('mod1.func called')
        mod.func()

    There's no __init__.py (so mytest is a namespace package, PEP-420) but adding an empty __init__.py makes no difference. The problem occurs with both Python 2 and 3.

    The root cause is the import cycle. I played around with dis and found the following (I suspect others have already found this but the thread was hard to follow for me initially):

    >>> dis.dis('import a.b')
      1           0 LOAD_CONST               0 (0)
                  2 LOAD_CONST               1 (None)
                  4 IMPORT_NAME              0 (a.b)
                  6 STORE_NAME               1 (a)
                  8 LOAD_CONST               1 (None)
                 10 RETURN_VALUE
    >>>

    compared to

    >>> dis.dis('import a.b as c')
      1           0 LOAD_CONST               0 (0)
                  2 LOAD_CONST               1 (None)
                  4 IMPORT_NAME              0 (a.b)
                  6 LOAD_ATTR                1 (b)      <-- error here
                  8 STORE_NAME               2 (c)
                 10 LOAD_CONST               1 (None)
                 12 RETURN_VALUE
    >>>

    What this shows is that the implementation of "import a.b" and "import a.b as c" are different. The former calls __import__('a.b', ...) which returns the module 'a' and stores that in the variable 'a'. In the OP's case, because of the import cycle, while sys.modules['a.b'] exists, module 'a' does not yet have the attribute 'b'. That's the reason that in the latter example, the LOAD_ATTR opcode fails.

    I'm inclined not to attempt to fix this. The reason that we don't pull 'a.b' out of sys.modules at this point is that if later in the execution of a/b.py we get an exception, the import will be cancelled, and sys.modules['a.b'] will be *deleted*, and then the variable 'c' would have a reference to a half-loaded module.

    The semantics of imports in the case of cycles are somewhat complex but clearly defined and there are only a few rules to consider, and from these rules it is possible to reason out whether any particular case is valid or not. I would prefer to keep it this way rather than add more special cases. There's a good reason why, *in general* (regardless of import cycles), "import a.b as c" is implemented as a getattr operation on a, not as an index operation on sys.modules (it is possible for module a to override its attribute b without updating sys.modules) and I'd rather keep those semantics than give them up for this particular edge case. Cyclic imports are hard. If they don't work for you, avoid them.

    @VictorVarvariuc
    Copy link
    Mannequin Author

    VictorVarvariuc mannequin commented Apr 10, 2017

    I'm inclined not to attempt to fix this. The reason that we don't pull 'a.b' out of sys.modules at this point is that if later in the execution of a/b.py we get an exception, the import will be cancelled, and sys.modules['a.b'] will be *deleted*, and then the variable 'c' would have a reference to a half-loaded module.

    I think this is solvable -- track and delete the references too.

    It's your decision in the end, but I think it's not good to make the language inconsistent because of technical limitations -- the interface should be the primary source for decisions.

    @VictorVarvariuc
    Copy link
    Mannequin Author

    VictorVarvariuc mannequin commented Apr 10, 2017

    it is possible for module a to override its attribute b without updating sys.modules

    This sounds like a really strange application.

    Even if someone overrides the attribute, when you do in other place import a.b.c as m you expect a.b.c to be path to a module, otherwise it would be hard to debug.

    The docs https://docs.python.org/3/reference/simple_stmts.html#import say import_stmt ::= "import" module ["as" name] meaning everywhere that module is a path to a module.

    https://docs.python.org/3/reference/import.html#searching

    To begin the search, Python needs the fully qualified name of the module (or package, but for the purposes of this discussion, the difference is immaterial) being imported. This name may come from various arguments to the import statement, or from the parameters to the importlib.import_module() or __import__() functions.

    So if you treat import a.b.c as m as import a; m = a.b.c -- it go go in some cases against the docs.

    @ncoghlan
    Copy link
    Contributor

    Right, this problem only arises with import cycles, and that's why we resisted making eager submodule resolution work *at all* for so long (bpo-992389 was filed way back in 2004).

    We only conceded the point (with bpo-17636 being implemented for 3.5) specifically to address a barrier to adoption for explicit relative imports, as it turned out that "from . import bar" could fail in cases where "import foo.bar" previously worked.

    The best explanation I could find for that rationale in the related python-dev thread is PJE's post here: https://mail.python.org/pipermail/python-dev/2013-April/125121.html

    What Victor's python-ideas thread pointed out is that there are actually *3* flavours of import where this particular circular reference problem can come up:

        # Has worked as long as Python has had packages,
        # as long as you only lazily resolve foo.bar in
        # function and method implementations
        import foo.bar
    
        # Has worked since 3.5 due to the IMPORT_FROM
        # change that falls back to a sys.modules lookup
        from foo import bar
    
        # Still gives AttributeError since it
        # eagerly resolves the attribute lookup
        import foo.bar as bar

    While I think the architectural case for allowing this kind of circular dependency between different top level namespaces is much weaker than that for allowing it within packages, I do think there's a reasonable consistency argument to be made in favour of ensuring that from foo import bar and import foo.bar as bar are functionally equivalent when bar is a submodule of foo, especially since the latter form makes it clearer to the reader that bar is a submodule, rather than any arbitrary attribute.

    I don't think it's a big problem in practice (so I wouldn't spend any time on implementing it myself), but the notion of an IMPORT_ATTR opcode for the "import x.y.z as m" case that parallels IMPORT_FROM seems architecturally clean to me in a way that the proposed resolutions to bpo-992389 weren't.

    @gvanrossum
    Copy link
    Member

    OK I won't close it but I'm withdrawing from the issue. If there are devs who want to implement this or mentor some contributor into implementing it feel free, but it sounds like Nick doesn't think it's important enough to fix (i.e. not worth his time) and I feel the same.

    @serhiy-storchaka
    Copy link
    Member

    It is easy to implement this (just replace LOAD_ATTR with IMPORT_FROM in the compiler). The hardest part is writing tests.

    @ncoghlan
    Copy link
    Contributor

    Huh, interesting - I'd missed that the only part of the "from a.b import c" that IMPORT_FROM implements is the LOAD_ATTR variant that falls back to sys.modules. The prior adjustment to get IMPORT_NAME to return "a.b" instead of "a" happens inside that opcode based on the fact that a non-empty from_list was passed in.

    So indeed, there's no new opcode needed - as Serhiy points out, the compiler just needs to emit IMPORT_FROM instead of LOAD_ATTR for this case.

    @PavolLisy
    Copy link
    Mannequin

    PavolLisy mannequin commented Apr 20, 2017

    Maybe I am wrong but don't we get another weird behavior?

       import sys.path  # this is error now
       ModuleNotFoundError: No module named 'sys.path'; 'sys' is not a package

    So we probably expect same behavior here:

       import sys.path as foo

    But if we just replace LOAD_ATTR with IMPORT_FROM in the compiler then wouldn't it works like:

       from sys import path as foo

    ?

    @serhiy-storchaka
    Copy link
    Member

    No, because the compiler generates different arguments for the IMPORT_NAME opcode.

    @ncoghlan
    Copy link
    Contributor

    Expanding on Serhiy's answer: the ModuleNotFoundError for "import sys.path" comes from IMPORT_NAME, not from any of the subsequent attribute lookups.

    That difference is visible in the bytecode:

    >>> dis("import sys.path as path")
      1           0 LOAD_CONST               0 (0)
                  3 LOAD_CONST               1 (None)
                  6 IMPORT_NAME              0 (sys.path)
                  9 LOAD_ATTR                1 (path)
                 12 STORE_NAME               1 (path)
                 15 LOAD_CONST               1 (None)
                 18 RETURN_VALUE

    For "import sys.path as path", the given module name is "sys.path", and the "from list" entry on the stack is None. This fails before it even reaches the LOAD_ATTR line, since "sys.path" isn't an importable module. Changing LOAD_ATTR to IMPORT_FROM would thus have no effect on its behaviour.

    >>> dis("from sys import path")
      1           0 LOAD_CONST               0 (0)
                  3 LOAD_CONST               1 (('path',))
                  6 IMPORT_NAME              0 (sys)
                  9 IMPORT_FROM              1 (path)
                 12 STORE_NAME               1 (path)
                 15 POP_TOP
                 16 LOAD_CONST               2 (None)
                 19 RETURN_VALUE

    For "from sys import path", the given module name is "sys", and the "from list" entry on the stack is a tuple containing the string "path". This works, since "sys" is importable *and* it has a "path" attribute.

    Hence why Serhiy's suggestion is such an elegant fix, since the only cases where the LOAD_ATTR vs IMPORT_FROM distinction matters will be those where:

    1. The submodule exists, so both "import a.b" and "from a import b" will work for the IMPORT_NAME step
    2. It's a circular import, so even though "a.b" exists in sys.modules, it won't be accessible as an attribute of "a" yet

    Getting the tests and any related documentation updates right will actually be harder than the code change.

    @VictorVarvariuc
    Copy link
    Mannequin Author

    VictorVarvariuc mannequin commented Apr 22, 2017

    How can I facilitate/contribute to get the changes suggested by Serhiy into Python?

    @serhiy-storchaka
    Copy link
    Member

    PR 1264 implements this idea.

    Seems this is a duplicate of bpo-23203 which contains the same idea.

    @serhiy-storchaka serhiy-storchaka self-assigned this May 4, 2017
    @serhiy-storchaka
    Copy link
    Member

    Could anyone please make a review?

    @brettcannon
    Copy link
    Member

    I've added your PR to my review queue, Serhiy, so I will get to it, I just can't make any promises as to when (hopefully this week).

    @serhiy-storchaka
    Copy link
    Member

    Thank you Brett.

    @serhiy-storchaka
    Copy link
    Member

    New changeset f93234b by Serhiy Storchaka in branch 'master':
    bpo-30024: Circular imports involving absolute imports with binding (bpo-1264)
    f93234b

    @serhiy-storchaka
    Copy link
    Member

    New changeset 265fcc5 by Serhiy Storchaka in branch 'master':
    bpo-31286, bpo-30024: Fixed stack usage in absolute imports with (bpo-3217)
    265fcc5

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life easy interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants