This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: spurious stat() calls in importlib
Type: performance Stage:
Components: Library (Lib) Versions: Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, eric.snow, neologix, pitrou
Priority: low Keywords:

Created on 2012-04-17 11:58 by pitrou, last changed 2022-04-11 14:57 by admin.

Messages (2)
msg158546 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-17 11:58
It seems importlib does multiple stat() calls on py files:

stat("/home/antoine/cpython/opt/Lib", {st_mode=S_IFDIR|0775, st_size=12288, ...}) = 0
stat("/home/antoine/cpython/opt/Lib/_sysconfigdata.py", {st_mode=S_IFREG|0664, st_size=16032, ...}) = 0
stat("/home/antoine/cpython/opt/Lib/_sysconfigdata.py", {st_mode=S_IFREG|0664, st_size=16032, ...}) = 0
open("/home/antoine/cpython/opt/Lib/__pycache__/_sysconfigdata.cpython-33.pyc", O_RDONLY) = 3


It also does multiple stat() calls on some directories:

stat("/home/antoine/cpython/opt/build/lib.linux-x86_64-3.3", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
stat("/home/antoine/.local/lib/python3.3/site-packages", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
stat("/home/antoine/.local/lib/python3.3/site-packages", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
stat("/home/antoine/.local/lib/python3.3/site-packages", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
open("/home/antoine/.local/lib/python3.3/site-packages", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3


That said, the number of system calls issued by 3.3 at startup is now much lower than with 3.2:

$ strace ./python -Sc pass 2>&1 | wc -l
512
$ strace python3.2 -Sc pass 2>&1 | wc -l
1018
msg158910 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-04-21 04:50
OK, so a cursory look at importlib suggests that the possible costs of those stat calls (by looking at what has to examine the filesystem) are:

* os.listdir() for caching
* os.path.isdir() for directories if they are a package
* os.path.isfile() for __init__.py
* os.path.isfile() for a module (and all the possible extensions)
* os.stat() for details of bytecode
* reading any bytecode
* reading the source
* writing the bytecode

So looking at that initial block of stat calls, I am willing to bet that Lib is getting the stat call by the os.path.isdir() check in the finder, the 2 Lib/_sysconfigdata.py checks are from the finder checking the file exists and then stat'ing in the loader for bytecode verification, and then finally the opening of the bytecode to read it and discover it's usable.

As for the multiple stat calls on directories, that's validating the cache isn't out-of-date which I don't see how that can be avoided short of hitting the system clock to see if some amount of time has passed.

As for the multiple stat calls between the finder and the loader, I don't see any way to cut that down without coming up with a find + load API which makes the call immediately or some way to pass in stat details, else you have race conditions on the status of the file before you check if the bytecode is stale. If the stat calls on the directories for cache validation is too frequent, then issue #14067 is probably your best bet.
History
Date User Action Args
2022-04-11 14:57:29adminsetgithub: 58809
2020-01-29 00:55:27brett.cannonsetnosy: - brett.cannon
2013-10-10 13:36:22christian.heimessetnosy: + christian.heimes
2012-04-21 04:50:27brett.cannonsetmessages: + msg158910
2012-04-17 15:24:32eric.snowsetnosy: + eric.snow
2012-04-17 11:58:51pitroucreate