This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eric.snow
Recipients barry, brett.cannon, christian.heimes, eric.snow, pitrou, r.david.murray, vstinner
Date 2013-10-11.00:53:45
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
For interpreter startup, stats are not involved for builtin and frozen modules[1].  They are tied to imports that involve traversing sys.path (a.k.a. PathFinder).  Most stats happen in FileFinder.find_loader.  The remainder are for source (.py) files (a.k.a. SourceFileLoader).

Here's a rough sketch of what typically happens currently during the import of a path-based module[2], as related to stats (and other FS access):

(lines with FS access start with *)

def load_module(fullname):
    suffixes = ['', '', '.so', '.py', '.pyc']
    tailname = fullname.rpartition('.')[2]
    for entry in sys.path:
*       mtime = os.stat(entry).st_mtime
        if mtime != cached_mtime:
*           cached_listdir = os.listdir(entry)
        if tailname in cached_listdir:
            basename = entry/tailname
*           if os.stat(basename).st_mode implies directory:  # superfluous?
                # package?
                for suffix in suffixes:
                    full_path = basename + suffix
*                   if os.stat(full_path).st_mode implies file:
                        if is_extension:
*                           <dlopen>(full_path)
                        elif is_sourceless:
*                           open(full_path).read()
        # ...non-package module?
        for suffix in suffixes:
            full_path = entry/tailname + suffix
            if tailname + suffix in cached_listdir:
*               if os.stat(full_path).st_mode implies file:  # superfluous?
                    if is_extension:
*                       <dlopen>(full_path)
                    elif is_sourceless:
*                       open(full_path).read()

def load_from_source(sourcepath):
*   st = os.stat(sourcepath)
    if st:
*       open(bytecodepath).read()
*       open(sourcepath).read()
*       os.stat(sourcepath).st_mode
        for parent in ancestor_dirs(sourcepath):
*           os.stat(parent).st_mode  ->  missing_parents
        for parent in missing_parents:
*           os.mkdir(parent)
*       open(tempname).write()
*       os.replace(tempname, bytecodepath)

Obviously there are some unix-isms in there.  Windows ends up not that different though.

stat/FS count

load_module (*per path entry*):
    (add 1 listdir to each if the cache is stale)
    not found: 1 stat
    non-package dir: 7 (num_suffixes + 2 stats)

    package (best): 4/5-9+ (3 stats, 1 read or load_from_source)
    package (worst): 8/9-13+ (num_suffixes + 2 stats, 1 read or load_from_source)
    non-package module 3/4-8+ (best): (2 stats, 1 read or load_from_source)
    non-package module 7/8-12+ (worst): (num_suffixes + 1 stats, 1 read or load_from_source)
    non-package module + dir (best): 10/11-15+ (num_suffixes + 4 stats, 1 read or load_from_source)
    non-package module + dir (best): 14/15-19+ (num_suffixes * 2 + 3 stats, 1 read or load_from_source)

    cached: 2 (1 stat, 1 read)
    uncached, no parents: 4 (2 stats, 1 write, 1 replace)
    uncached, no missing parents: 5+ (num_parents + 2 stats, 1 write, 1 replace)
    uncached, missing parents: 6+ (num_parents + 2 stats, num_missing mkdirs, 1 write, 1 replace)


* the common case is not fast (for the sake of the slight possibility that files may change between imports)--not as much an issue during interpreter startup.
* up to 5 different suffixes with a separate stat for each (with extension module suffixes tried first).
* the size and ordering of sys.path has a decided impact on # stats.
* if a module is cached, a lot less FS access happens.
* the more nested a module, the more access happen.
* namespace packages don't have much impact on performance.

Possible improvements:

* provide an internal mechanism to turn on/off caching all stats (don't worry about staleness) and maybe expose it via a context manager/API. (not unlike what Christian put in his patch.)
* at least do some temporally local caching where the risk of staleness is particularly small.
* Move .py ahead of extension modules (or just behind
* non-packages are more common than packages (?) so look for those first (hard to make effective without breaking key import semantics).
* remove 2 possibly superfluous stats?

[1] Maybe we should freeze the stdlib. <0.5 wink>
[2] importing a module usually involves importing the module's parent and its parent and so forth.  Each of those incurs the same stat hits all over again (though usually packages have only 1 path entry to traverse).  The stdlib is pretty flat (particularly among modules involved during startup) so this is less of an issue for this ticket.
Date User Action Args
2013-10-11 00:53:46eric.snowsetrecipients: + eric.snow, barry, brett.cannon, pitrou, vstinner, christian.heimes, r.david.murray
2013-10-11 00:53:46eric.snowsetmessageid: <>
2013-10-11 00:53:46eric.snowlinkissue19216 messages
2013-10-11 00:53:45eric.snowcreate