This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Provide a way to disable bytecode staleness checks
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.5
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: brett.cannon Nosy List: brett.cannon, gregory.p.smith, pitrou, raulcd, serhiy.storchaka, twouters
Priority: low Keywords:

Created on 2015-03-20 18:14 by brett.cannon, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (11)
msg238704 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-03-20 18:14
In environments where bytecode generation is thoroughly controlled, it would be nice if there was a way to specify that the bytecode file is externally guaranteed to be up-to-date, and thus skip any stat call involving bytecode verification. Could be represented with a timestamp of either all zeroes or ones in the bytecode file header.
msg238705 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-03-20 18:15
What is the benefit?
msg238724 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-03-20 21:22
This would avoid the need to modify an interpreter to have this optimization.  In this mode the potentially expensive stat() call is avoided.  No need to ensure that the pyc file's embedded timestamp matches the py file's timestamp.  The only use of this mode would be when that is guaranteed by a build system so loading modules continues to be fast without reverting to loading the py when the pyc is already known good.

Our specific example is a build system that generates pyc's for some py files at build time but timestamps of files are not maintained at all through the build/packaging/distribution process because they are seen an irrelevant detail and not even kept track of by the build.  Python is fairly unique in wanting to depend upon file timestamp metadata as a form of input data going from py -> pyc.

Right now we work around the problem by not having py files available at all in this situation.  Using .pycs greatly speeds up the program's load time, but not having source code around makes for worse tracebacks and causes problems with other tools which need to use the source.
msg238801 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-03-21 13:55
What Greg said. =) Basically it would allow those who know what they are doing to cut out a stat call per load. I suspect anyone deploying to a server is in a similar situation where they are not actively editing the code once deployed, and so saving on the startup of a new process (probably most beneficial in a CI situation).

I might also simply refactor the importlib loader code to make this at least possible for someone to implement without doing the work for them.
msg238807 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-03-21 14:12
Can you please provide timing numbers?
msg238809 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-21 14:34
Wouldn't zipimport provide better performance? If bytecode generation is thoroughly controlled, could you collect your .pyc files in a ZIP file?
msg238823 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-03-21 18:55
We already use zipimport for most production deployments.  It works well.  We've modified our own zipimport to ignore timestamps as keeping them in sync between pyc and py files in the zip files own timestamps is painful.  Unfortunately the stdlib zipimport actually checks pyc timestamps against py files in the .zip file in 3.4 and 2.7 (https://hg.python.org/cpython/file/e8878579eb68/Modules/zipimport.c#l1187 mtime is checked, despite a comment in there in 3.4 suggesting it is probably pointless).  Changing that is a separate issue (I'll go open one).

Where this hurts us the most is in our build system when not building a final production zipped up binary (which would take as long as loading all of the py and pyc files would and would prevent iterative development).  Our py files and pyc files are located on a read only build artifact object store.  As a mounted filesystem it does not have a POSIX concept of file mtime at all (and never will).

When you're using a readonly filesystem of build time generated .py code without the concept of an mtime you really really want to tell Python to trust the build system and assume pyc files it finds match the corresponding py files.  Or your large application/test start up time really suffers.

In our use case, it is on the order of a 30% startup time improvement to use precompiled pyc files for our generated code py files (a ton of protobuf python modules) on a large application.

Most people are likely not in this situation because they are just lowly individuals operating on a simple writable posix filesystem in front of them. But when it matters, it really matters. People should be able to tell Python "trust me, i know what I'm doing" when it comes to compiled code loading.  It is easy enough to modify compile to write a "never verify this" magic timestamp into a pyc.  (I'd get more creative and use a value other than all 0s or 1s; pick the release date of the first version Python as your magic timestamp for example; nothing is likely to accidentally end up with that date in it)

That's all this issue is asking for.
msg238826 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-03-21 19:12
That sounds kind of reasonable, but how are we supposed to document this? Or is this only a "secret backdoor" for people in the know?
msg238906 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-03-22 14:39
I haven't thought about how to implement it, let alone document it. As I said, I might simply refactor importlib so that others can at least implement a loader which can do this without having to directly muck with importlib itself. It really depends on how far one would want to go with this.

Otherwise I would add a note in importlib in the appropriate loader(s) that if such-and-such a datetime is specified in the bytecode header then all stat-related staleness checks against the original source is disabled and the only validation is the magic number (since it's cheap and a nice safety check).
msg241979 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-04-24 20:08
Skipping the source stat makes no difference in startup time even if you import django.http as part of the work. This would definitely be mostly for people who launch so many processes that they actually gain from collecting microseconds worth of benefit from each Python process.
msg261994 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-03-18 20:31
I realized that importlib.abc.SourceLoader.path_stats() provides a way to override stat collection and basically hard-code the stat number for all files from an import perspective. So if people simply generated their bytecode with a static timestamp and overrode this method then they would get the effect they want.
History
Date User Action Args
2022-04-11 14:58:14adminsetgithub: 67911
2016-03-18 20:31:37brett.cannonsetstatus: open -> closed
resolution: out of date
messages: + msg261994

stage: resolved
2015-04-24 20:08:35brett.cannonsetmessages: + msg241979
2015-04-13 21:41:17raulcdsetnosy: + raulcd
2015-03-22 14:39:12brett.cannonsetmessages: + msg238906
2015-03-21 19:12:54pitrousetmessages: + msg238826
2015-03-21 18:55:59gregory.p.smithsetmessages: + msg238823
2015-03-21 14:34:06serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg238809
2015-03-21 14:12:02pitrousetmessages: + msg238807
2015-03-21 13:55:28brett.cannonsetmessages: + msg238801
2015-03-20 21:22:42gregory.p.smithsetmessages: + msg238724
2015-03-20 18:15:58pitrousetnosy: + pitrou
messages: + msg238705
2015-03-20 18:14:42brett.cannoncreate