Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spurious stat() calls in importlib #58809

Closed
pitrou opened this issue Apr 17, 2012 · 4 comments
Closed

spurious stat() calls in importlib #58809

pitrou opened this issue Apr 17, 2012 · 4 comments
Labels
pending The issue will be closed if no feedback is provided performance Performance or resource usage stdlib Python modules in the Lib dir topic-importlib

Comments

@pitrou
Copy link
Member

pitrou commented Apr 17, 2012

BPO 14604
Nosy @pitrou, @tiran, @ericsnowcurrently

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2012-04-17.11:58:51.304>
labels = ['library', 'performance']
title = 'spurious stat() calls in importlib'
updated_at = <Date 2020-01-29.00:55:27.014>
user = 'https://github.com/pitrou'

bugs.python.org fields:

activity = <Date 2020-01-29.00:55:27.014>
actor = 'brett.cannon'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2012-04-17.11:58:51.304>
creator = 'pitrou'
dependencies = []
files = []
hgrepos = []
issue_num = 14604
keywords = []
message_count = 2.0
messages = ['158546', '158910']
nosy_count = 4.0
nosy_names = ['pitrou', 'christian.heimes', 'neologix', 'eric.snow']
pr_nums = []
priority = 'low'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue14604'
versions = ['Python 3.3']

@pitrou
Copy link
Member Author

pitrou commented Apr 17, 2012

It seems importlib does multiple stat() calls on py files:

stat("/home/antoine/cpython/opt/Lib", {st_mode=S_IFDIR|0775, st_size=12288, ...}) = 0
stat("/home/antoine/cpython/opt/Lib/_sysconfigdata.py", {st_mode=S_IFREG|0664, st_size=16032, ...}) = 0
stat("/home/antoine/cpython/opt/Lib/_sysconfigdata.py", {st_mode=S_IFREG|0664, st_size=16032, ...}) = 0
open("/home/antoine/cpython/opt/Lib/pycache/_sysconfigdata.cpython-33.pyc", O_RDONLY) = 3

It also does multiple stat() calls on some directories:

stat("/home/antoine/cpython/opt/build/lib.linux-x86_64-3.3", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
stat("/home/antoine/.local/lib/python3.3/site-packages", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
stat("/home/antoine/.local/lib/python3.3/site-packages", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
stat("/home/antoine/.local/lib/python3.3/site-packages", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
open("/home/antoine/.local/lib/python3.3/site-packages", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3

That said, the number of system calls issued by 3.3 at startup is now much lower than with 3.2:

$ strace ./python -Sc pass 2>&1 | wc -l
512
$ strace python3.2 -Sc pass 2>&1 | wc -l
1018

@pitrou pitrou added stdlib Python modules in the Lib dir performance Performance or resource usage labels Apr 17, 2012
@brettcannon
Copy link
Member

OK, so a cursory look at importlib suggests that the possible costs of those stat calls (by looking at what has to examine the filesystem) are:

  • os.listdir() for caching
  • os.path.isdir() for directories if they are a package
  • os.path.isfile() for __init__.py
  • os.path.isfile() for a module (and all the possible extensions)
  • os.stat() for details of bytecode
  • reading any bytecode
  • reading the source
  • writing the bytecode

So looking at that initial block of stat calls, I am willing to bet that Lib is getting the stat call by the os.path.isdir() check in the finder, the 2 Lib/_sysconfigdata.py checks are from the finder checking the file exists and then stat'ing in the loader for bytecode verification, and then finally the opening of the bytecode to read it and discover it's usable.

As for the multiple stat calls on directories, that's validating the cache isn't out-of-date which I don't see how that can be avoided short of hitting the system clock to see if some amount of time has passed.

As for the multiple stat calls between the finder and the loader, I don't see any way to cut that down without coming up with a find + load API which makes the call immediately or some way to pass in stat details, else you have race conditions on the status of the file before you check if the bytecode is stale. If the stat calls on the directories for cache validation is too frequent, then issue bpo-14067 is probably your best bet.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@iritkatriel iritkatriel added the pending The issue will be closed if no feedback is provided label Sep 11, 2022
@iritkatriel
Copy link
Member

This issue has been idle for over a decade. I will close it unless someone indicates that they still want to go back to it.

@brettcannon
Copy link
Member

I'm going to preemptively close this as any perf improvements around this have either already been done or are not important enough to be a concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending The issue will be closed if no feedback is provided performance Performance or resource usage stdlib Python modules in the Lib dir topic-importlib
Projects
None yet
Development

No branches or pull requests

4 participants