classification
Title: Fix docs about module search order
Type: behavior Stage:
Components: Documentation Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: brett.cannon, dmugtasimov, docs@python, ncoghlan, r.david.murray
Priority: normal Keywords:

Created on 2013-01-08 08:14 by dmugtasimov, last changed 2013-01-09 07:55 by dmugtasimov.

Messages (8)
msg179321 - (view) Author: Dmitry Mugtasimov (dmugtasimov) Date: 2013-01-08 08:14
http://docs.python.org/2/tutorial/modules.html should be rewritten.
AS IS
6.1.2. The Module Search Path

When a module named spam is imported, the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path. sys.path is initialized from these locations:

TO BE
6.1.2. The Module Search Path

When a module named spam is imported, the interpreter first searches for a built-in module with that name. If not found, it looks in the containing package (the package of which the current module is a submodule). If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path. sys.path is initialized from these locations:

------
Note that now "6.1.2. The Module Search Path" and "6.4.2. Intra-package References" are contradictary since in 6.4.2 it is said: "In fact, such references are so common that the import statement first looks in the containing package before looking in the standard module search path.", but this is not reflected in 6.1.2.

------
EXAMPLE (for more information see  http://stackoverflow.com/questions/14183541/why-python-finds-module-instead-of-package-if-they-have-the-same-name#comment19687166_14183541 ):
/home/dmugtasimov/tmp/name-res3/xyz
    __init__.py
    a.py
    b.py
    t.py
    xyz.py

Files init.py, b.py and xyz.py are empty
File a.py:

import os, sys
ROOT_DIRECTORY = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
if not sys.path or ROOT_DIRECTORY not in sys.path:
    print 'sys.path is modified in a.py'
    sys.path.insert(0, ROOT_DIRECTORY)
else:
    print 'sys.path is NOT modified in a.py'

print 'sys.path:', sys.path
print 'BEFORE import xyz.b'
import xyz.b
print 'AFTER import xyz.b'

File t.py:

import os, sys
ROOT_DIRECTORY = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
if not sys.path or ROOT_DIRECTORY not in sys.path:
    print 'sys.path is modified in t.py'
    sys.path.insert(0, ROOT_DIRECTORY)
else:
    print 'sys.path is NOT modified in t.py'

import xyz.a

Run:

python a.py

Output:

    sys.path is modified in a.py
    sys.path: ['/home/dmugtasimov/tmp/name-res3', '/home/dmugtasimov/tmp/name-res3/xyz',
     '/usr/local/lib/python2.7/dist-packages/tornado-2.3-py2.7.egg',
     '/home/dmugtasimov/tmp/name-res3/xyz', '/usr/lib/python2.7',
     '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk',
     '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload',
     '/usr/local/lib/python2.7/dist-packages',
     '/usr/local/lib/python2.7/dist-packages/setuptools-0.6c11-py2.7.egg-info',
     '/usr/lib/python2.7/dist-packages',
     '/usr/lib/python2.7/dist-packages/PIL',
     '/usr/lib/python2.7/dist-packages/gst-0.10',
     '/usr/lib/python2.7/dist-packages/gtk-2.0',
     '/usr/lib/python2.7/dist-packages/ubuntu-sso-client']
    BEFORE import xyz.b
    AFTER import xyz.b

Run:

python -vv a.py

Output:

    import xyz # directory /home/dmugtasimov/tmp/name-res3/xyz
    # trying /home/dmugtasimov/tmp/name-res3/xyz/__init__.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/__init__module.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/__init__.py
    # /home/dmugtasimov/tmp/name-res3/xyz/__init__.pyc matches /home/dmugtasimov/tmp/name-res3/xyz/__init__.py
    import xyz # precompiled from /home/dmugtasimov/tmp/name-res3/xyz/__init__.pyc
    # trying /home/dmugtasimov/tmp/name-res3/xyz/b.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/bmodule.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/b.py
    # /home/dmugtasimov/tmp/name-res3/xyz/b.pyc matches /home/dmugtasimov/tmp/name-res3/xyz/b.py
    import xyz.b # precompiled from /home/dmugtasimov/tmp/name-res3/xyz/b.pyc

Run:

python t.py

Output:

    sys.path is modified in t.py
    sys.path is NOT modified in a.py
    sys.path: ['/home/dmugtasimov/tmp/name-res3', '/home/dmugtasimov/tmp/name-res3/xyz',
     '/usr/local/lib/python2.7/dist-packages/tornado-2.3-py2.7.egg',
     '/home/dmugtasimov/tmp/name-res3/xyz', '/usr/lib/python2.7',
     '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk',
     '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload',
     '/usr/local/lib/python2.7/dist-packages',
     '/usr/local/lib/python2.7/dist-packages/setuptools-0.6c11-py2.7.egg-info',
     '/usr/lib/python2.7/dist-packages',
     '/usr/lib/python2.7/dist-packages/PIL',
     '/usr/lib/python2.7/dist-packages/gst-0.10',
     '/usr/lib/python2.7/dist-packages/gtk-2.0',
     '/usr/lib/python2.7/dist-packages/ubuntu-sso-client']
    BEFORE import xyz.b
    Traceback (most recent call last):
      File "t.py", line 9, in <module>
        import xyz.a
      File "/home/dmugtasimov/tmp/name-res3/xyz/a.py", line 11, in <module>
        import xyz.b
    ImportError: No module named b

Run:

python -vv t.py

Output:

    import xyz # directory /home/dmugtasimov/tmp/name-res3/xyz
    # trying /home/dmugtasimov/tmp/name-res3/xyz/__init__.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/__init__module.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/__init__.py
    # /home/dmugtasimov/tmp/name-res3/xyz/__init__.pyc matches /home/dmugtasimov/tmp/name-res3/xyz/__init__.py
    import xyz # precompiled from /home/dmugtasimov/tmp/name-res3/xyz/__init__.pyc
    # trying /home/dmugtasimov/tmp/name-res3/xyz/a.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/amodule.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/a.py
    # /home/dmugtasimov/tmp/name-res3/xyz/a.pyc matches /home/dmugtasimov/tmp/name-res3/xyz/a.py
    import xyz.a # precompiled from /home/dmugtasimov/tmp/name-res3/xyz/a.pyc
    # trying /home/dmugtasimov/tmp/name-res3/xyz/os.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/osmodule.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/os.py
    # trying /home/dmugtasimov/tmp/name-res3/xyz/os.pyc
    # trying /home/dmugtasimov/tmp/name-res3/xyz/sys.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/sysmodule.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/sys.py
    # trying /home/dmugtasimov/tmp/name-res3/xyz/sys.pyc
    # trying /home/dmugtasimov/tmp/name-res3/xyz/xyz.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/xyzmodule.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/xyz.py
    # /home/dmugtasimov/tmp/name-res3/xyz/xyz.pyc matches /home/dmugtasimov/tmp/name-res3/xyz/xyz.py
    import xyz.xyz # precompiled from /home/dmugtasimov/tmp/name-res3/xyz/xyz.pyc
    #   clear[2] __file__
    #   clear[2] __package__
    #   clear[2] sys
    #   clear[2] ROOT_DIRECTORY
    #   clear[2] __name__
    #   clear[2] os
    sys.path is modified in t.py
    sys.path is NOT modified in a.py
    sys.path: ['/home/dmugtasimov/tmp/name-res3', '/home/dmugtasimov/tmp/name-res3/xyz',
     '/usr/local/lib/python2.7/dist-packages/tornado-2.3-py2.7.egg',
     '/home/dmugtasimov/tmp/name-res3/xyz', '/usr/lib/python2.7',
     '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk',
     '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload',
     '/usr/local/lib/python2.7/dist-packages',
     '/usr/local/lib/python2.7/dist-packages/setuptools-0.6c11-py2.7.egg-info',
     '/usr/lib/python2.7/dist-packages',
     '/usr/lib/python2.7/dist-packages/PIL',
     '/usr/lib/python2.7/dist-packages/gst-0.10',
     '/usr/lib/python2.7/dist-packages/gtk-2.0',
     '/usr/lib/python2.7/dist-packages/ubuntu-sso-client']
    BEFORE import xyz.b
    Traceback (most recent call last):
      File "t.py", line 9, in <module>
        import xyz.a
      File "/home/dmugtasimov/tmp/name-res3/xyz/a.py", line 11, in <module>
        import xyz.b
    ImportError: No module named b

As you see sys.path is the same for both cases:

sys.path: ['/home/dmugtasimov/tmp/name-res3', '/home/dmugtasimov/tmp/name-res3/xyz', '/usr/local/lib/python2.7/dist-packages/tornado-2.3-py2.7.egg', '/home/dmugtasimov/tmp/name-res3/xyz', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/local/lib/python2.7/dist-packages/setuptools-0.6c11-py2.7.egg-info', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PIL', '/usr/lib/python2.7/dist-packages/gst-0.10', '/usr/lib/python2.7/dist-packages/gtk-2.0', '/usr/lib/python2.7/dist-packages/ubuntu-sso-client']

But the behaviour is different. For a.py python searches for package xyz first, and them for module b in it:

    import xyz # directory /home/dmugtasimov/tmp/name-res3/xyz
    # trying /home/dmugtasimov/tmp/name-res3/xyz/__init__.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/__init__module.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/__init__.py
    # /home/dmugtasimov/tmp/name-res3/xyz/__init__.pyc matches /home/dmugtasimov/tmp/name-res3/xyz/__init__.py
    import xyz # precompiled from /home/dmugtasimov/tmp/name-res3/xyz/__init__.pyc
    # trying /home/dmugtasimov/tmp/name-res3/xyz/b.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/bmodule.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/b.py
    # /home/dmugtasimov/tmp/name-res3/xyz/b.pyc matches /home/dmugtasimov/tmp/name-res3/xyz/b.py
    import xyz.b # precompiled from /home/dmugtasimov/tmp/name-res3/xyz/b.pyc

In other words:

    Search PACKAGE xyz in directory sys.path[0] -> FOUND
    Search module b in PACKAGE xyz -> FOUND
    Continue execution

For t.py it searches for moduel xyz in the same directory as a.py itself and then fails to find module b in module xyz:

    # trying /home/dmugtasimov/tmp/name-res3/xyz/xyz.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/xyzmodule.so
    # trying /home/dmugtasimov/tmp/name-res3/xyz/xyz.py
    # /home/dmugtasimov/tmp/name-res3/xyz/xyz.pyc matches /home/dmugtasimov/tmp/name-res3/xyz/xyz.py
    import xyz.xyz # precompiled from /home/dmugtasimov/tmp/name-res3/xyz/xyz.pyc

In other words:

    Search MODULE xyz in directory in the same directory as a.py (or sys.path[1] ?) -> FOUND
    Search MODULE b in MODULE xyz -> NOT FOUND
    ImportError

So it looks like if "import xyz.b" bahaves different depending on how a.py was initially loaded as a script or imported from another module.
msg179322 - (view) Author: Dmitry Mugtasimov (dmugtasimov) Date: 2013-01-08 08:17
UPDATE:
CHANGE
http://stackoverflow.com/questions/14183541/why-python-finds-module-instead-of-package-if-they-have-the-same-name#comment19687166_14183541

TO
http://stackoverflow.com/questions/14183541/why-python-finds-module-instead-of-package-if-they-have-the-same-name

Because the whole question and replies are important.
msg179353 - (view) Author: Dmitry Mugtasimov (dmugtasimov) Date: 2013-01-08 14:12
As I investigate it a little closer it seems to me that it is not a documentation issue, but an implementation issue.

http://docs.python.org/2/reference/simple_stmts.html#import
"A package can contain other packages and modules while modules cannot contain other modules or packages."

The only why to import name from module is
from xyz import b
where "b" is name defined inside xyz module.

Issuing
import xyz.b
means that "b" is module or package.

Therefore xyz cannot be a module, since "...modules cannot contain other modules or packages." This means that xyz is package.

The problem is that it is considered as module for case:
python t.py
msg179355 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-01-08 14:43
> So it looks like if "import xyz.b" bahaves different depending on how 
> a.py was initially loaded as a script or imported from another module.

There are several differences between importing a module and running a script, one of which is what is on sys.path.  You constructed your example to mask the path difference.

You are getting hit by the difference between absolute and relative imports.  How implicit relative imports behave are one of the other things that are different between running a file as a script and importing it.  This kind of confusion is one of the reasons implicit relative imports were dropped in Python3.  

If you add

  from __future__ import absolute_import

to the top of your a and t files, t will no longer produce an import error.

It could be that the documentation could be improved, but I'm not sure it is worth the effort for 2.7.  If there are statements in the 3.x docs that are incorrect now that implicit relative imports are gone, those would definitely be worth fixing.
msg179357 - (view) Author: Dmitry Mugtasimov (dmugtasimov) Date: 2013-01-08 15:04
A lot of people are still using python 2.7, even 2.6. For me it would be a nice fix in docs since I spent a plenty of time, trying to figure out what is going on.

In my previous comment I also pointed out that implementation probably should be fixed too, since interpreter tries to import module from module, which should not be possible according to docs. It may be an issue for Python 3 too, but I did not check.
msg179399 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-01-08 23:40
The docs sometimes try to draw a sharp distinction between modules and
packages, but it's essentially a lie - a package is really just a module
with a __path__ attribute, and if you know what you are doing, you make
even an ordinary module behave like a package (e.g. os.path)

Until 3.3, the import system had too many internal inconsistencies for sane
documentation. That's fixed in 3.3, but no comprehensive docs will be
happening for earlier versions.
msg179405 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-01-09 00:59
That said, the originally proposed docs change looks like a solid improvement to me.

It's still a lie, but an appropriate one for the tutorial.
msg179422 - (view) Author: Dmitry Mugtasimov (dmugtasimov) Date: 2013-01-09 07:55
Further investigation led me to the conclusion that "TO BE" should look like this:

6.1.2. The Module Search Path

When a module named spam is imported, the interpreter first searches in the containing package (the package of which the current module is a submodule) if applicable (a module is not required to be a submodule of a package). If not found or module is not a part of any package it searches for a built-in module with that name. If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path. sys.path is initialized from these locations:
History
Date User Action Args
2013-01-09 07:55:45dmugtasimovsetmessages: + msg179422
2013-01-09 00:59:03ncoghlansetmessages: + msg179405
2013-01-08 23:40:01ncoghlansetmessages: + msg179399
2013-01-08 15:04:16dmugtasimovsetmessages: + msg179357
2013-01-08 14:43:17r.david.murraysetnosy: + r.david.murray
messages: + msg179355
2013-01-08 14:12:43dmugtasimovsetmessages: + msg179353
2013-01-08 12:58:35eric.araujosetnosy: + brett.cannon, ncoghlan
2013-01-08 08:17:13dmugtasimovsetmessages: + msg179322
2013-01-08 08:14:31dmugtasimovcreate