Message199430
For interpreter startup, stats are not involved for builtin and frozen modules[1]. They are tied to imports that involve traversing sys.path (a.k.a. PathFinder). Most stats happen in FileFinder.find_loader. The remainder are for source (.py) files (a.k.a. SourceFileLoader).
Here's a rough sketch of what typically happens currently during the import of a path-based module[2], as related to stats (and other FS access):
(lines with FS access start with *)
def load_module(fullname):
suffixes = ['.cpython-34m.so', '.abi3.so', '.so', '.py', '.pyc']
tailname = fullname.rpartition('.')[2]
for entry in sys.path:
* mtime = os.stat(entry).st_mtime
if mtime != cached_mtime:
* cached_listdir = os.listdir(entry)
if tailname in cached_listdir:
basename = entry/tailname
* if os.stat(basename).st_mode implies directory: # superfluous?
# package?
for suffix in suffixes:
full_path = basename + suffix
* if os.stat(full_path).st_mode implies file:
if is_extension:
* <dlopen>(full_path)
elif is_sourceless:
* open(full_path).read()
else:
load_from_source(full_path)
return
# ...non-package module?
for suffix in suffixes:
full_path = entry/tailname + suffix
if tailname + suffix in cached_listdir:
* if os.stat(full_path).st_mode implies file: # superfluous?
if is_extension:
* <dlopen>(full_path)
elif is_sourceless:
* open(full_path).read()
else:
load_from_source(full_path)
def load_from_source(sourcepath):
* st = os.stat(sourcepath)
if st:
* open(bytecodepath).read()
else:
* open(sourcepath).read()
* os.stat(sourcepath).st_mode
for parent in ancestor_dirs(sourcepath):
* os.stat(parent).st_mode -> missing_parents
for parent in missing_parents:
* os.mkdir(parent)
* open(tempname).write()
* os.replace(tempname, bytecodepath)
Obviously there are some unix-isms in there. Windows ends up not that different though.
stat/FS count
-------------
load_module (*per path entry*):
(add 1 listdir to each if the cache is stale)
not found: 1 stat
non-package dir: 7 (num_suffixes + 2 stats)
package (best): 4/5-9+ (3 stats, 1 read or load_from_source)
package (worst): 8/9-13+ (num_suffixes + 2 stats, 1 read or load_from_source)
non-package module 3/4-8+ (best): (2 stats, 1 read or load_from_source)
non-package module 7/8-12+ (worst): (num_suffixes + 1 stats, 1 read or load_from_source)
non-package module + dir (best): 10/11-15+ (num_suffixes + 4 stats, 1 read or load_from_source)
non-package module + dir (best): 14/15-19+ (num_suffixes * 2 + 3 stats, 1 read or load_from_source)
load_from_source:
cached: 2 (1 stat, 1 read)
uncached, no parents: 4 (2 stats, 1 write, 1 replace)
uncached, no missing parents: 5+ (num_parents + 2 stats, 1 write, 1 replace)
uncached, missing parents: 6+ (num_parents + 2 stats, num_missing mkdirs, 1 write, 1 replace)
Highlights:
* the common case is not fast (for the sake of the slight possibility that files may change between imports)--not as much an issue during interpreter startup.
* up to 5 different suffixes with a separate stat for each (with extension module suffixes tried first).
* the size and ordering of sys.path has a decided impact on # stats.
* if a module is cached, a lot less FS access happens.
* the more nested a module, the more access happen.
* namespace packages don't have much impact on performance.
Possible improvements:
* provide an internal mechanism to turn on/off caching all stats (don't worry about staleness) and maybe expose it via a context manager/API. (not unlike what Christian put in his patch.)
* at least do some temporally local caching where the risk of staleness is particularly small.
* Move .py ahead of extension modules (or just behind .cpython-34m.so)?
* non-packages are more common than packages (?) so look for those first (hard to make effective without breaking key import semantics).
* remove 2 possibly superfluous stats?
[1] Maybe we should freeze the stdlib. <0.5 wink>
[2] importing a module usually involves importing the module's parent and its parent and so forth. Each of those incurs the same stat hits all over again (though usually packages have only 1 path entry to traverse). The stdlib is pretty flat (particularly among modules involved during startup) so this is less of an issue for this ticket. |
|
Date |
User |
Action |
Args |
2013-10-11 00:53:46 | eric.snow | set | recipients:
+ eric.snow, barry, brett.cannon, pitrou, vstinner, christian.heimes, r.david.murray |
2013-10-11 00:53:46 | eric.snow | set | messageid: <1381452826.73.0.894253711926.issue19216@psf.upfronthosting.co.za> |
2013-10-11 00:53:46 | eric.snow | link | issue19216 messages |
2013-10-11 00:53:45 | eric.snow | create | |
|