This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eschwartz
Recipients Alexandru Ardelean, Ray Donnelly, barry, benjamin.peterson, bmwiedemann, brett.cannon, dstufft, eric.araujo, eric.smith, eschwartz, vstinner, yan12125
Date 2018-01-14.00:57:41
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1515891463.34.0.467229070634.issue29708@psf.upfronthosting.co.za>
In-reply-to
Content
So, a couple of things.

It seems to me, that properly supporting SOURCE_DATE_EPOCH means using exactly that and nothing else. To that end, I'm not entirely sure why things like --clamp-mtime even exist, as the original timestamp of a source file doesn't seem to have a lot of utility and it is better to be entirely predictable. But I'm not going to argue that, except insomuch as it seems IMHO to fit better for python to just keep things simple and override the timestamp with the value of SOURCE_DATE_EPOCH

That being said, I see two problems with python implementing something analogous to --clamp-mtime rather than just --mtime.


1) Source files are extracted by some build process, and remain untouched. Python generates bytecode pinned to the original time, rather than SOURCE_DATE_EPOCH. Later, the build process packages those files and implements --mtime, not --clamp-mtime. Because Python and the packaging software disagree about which one to use, the bytecode fails.

2) Source files are extracted, and the build process even tosses all timestamps to the side of the road, by explicitly `touch`ing all of them to the date of SOURCE_DATE_EPOCH just in case. Then for whatever reason (distro patches, 2to3, the use of `cp`) the timestamps get updated to $currentime. But SOURCE_DATE_EPOCH is in the future, so the timestamps get downdated. Python bytecode is generated by emulating --clamp-mtime. The build process then uses --mtime to package the files. Again, because Python and the packaging software disagree about which one to use, the bytecode fails.

Of course, in both those cases, blindly respecting SOURCE_DATE_EPOCH will seemingly break everything for people who use --clamp-mtime instead. I'm not happy with reproducible-builds.org for allowing either one.

I don't think python should rely on --mtime users manually overriding the filesystem metadata of the source files outside of py_compile, as that is a hack that I think we'd like to remove if possible... that being said, Arch Linux will, on second thought, not be adversely affected even if py_compile tries to be clever and emulate --clamp-mtime to decide on its own whether to respect SOURCE_DATE_EPOCH.

Likewise, I don't really expect people to try to reproduce builds using a future date for SOURCE_DATE_EPOCH. On the other hand, the reproducible builds spec doesn't forbid it AFAICT.

But... neither of those mitigations seem "clean" to me, for the reasons stated above.

There is something that would solve all these issues, though. From reading the importlib code (I haven't actually tried smoketesting actual imports), it appears that Python 2 accepts any bytecode that is dated at or later than the timestamp of its source .py, while Python 3 requires the timestamps to perfectly match. This seems bizarre to behave differently, especially as until @bmwiedemann mentioned it on the GitHub PR I blindly assumed that Python would not care if your bytecode is somehow dated later than your sources. If the user is playing monkey games with mismatched source and byte code, while backdating the source code to *trick* the interpreter into loading it... let them? They can break their stuff if they want to!

On looking through the commit logs, it seems that Python 3 used to do the same, until https://github.com/python/cpython/commit/61b14251d3a653548f70350acb250cf23b696372 refactored the general vicinity and modified this behavior without warning. In a commit that seems to be designed to do something else entirely. This really should have been two separate commits, and modifying the import code to more strictly check the timestamp should have come with an explanatory justification. Because I cannot think of a good reason for this behavior, and the commit isn't giving me an opportunity to understand either. As it is, I am completely confused, and have no idea whether this was even supposed to be deliberate.
In hindsight it is certainly preventing nice solutions to supporting SOURCE_DATE_EPOCH.
History
Date User Action Args
2018-01-14 00:57:44eschwartzsetrecipients: + eschwartz, barry, brett.cannon, vstinner, eric.smith, benjamin.peterson, eric.araujo, dstufft, yan12125, bmwiedemann, Alexandru Ardelean, Ray Donnelly
2018-01-14 00:57:43eschwartzsetmessageid: <1515891463.34.0.467229070634.issue29708@psf.upfronthosting.co.za>
2018-01-14 00:57:43eschwartzlinkissue29708 messages
2018-01-14 00:57:41eschwartzcreate