classification
Title: import.c sometimes generates incorrect timestamps on Windows + NTFS
Type: behavior Stage: needs patch
Components: Interpreter Core, Windows Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: brian.curtin, eric.snow, loewis, mackeith, mark.dickinson, pitrou, python-dev, qwertyface, tim.golden, vstinner
Priority: high Keywords: patch

Created on 2012-01-25 19:28 by mark.dickinson, last changed 2012-12-24 16:35 by mark.dickinson. This issue is now closed.

Files
File name Uploaded Description Edit
pyc_mtime.patch mark.dickinson, 2012-12-16 17:35 review
pyc_mtime2.patch mark.dickinson, 2012-12-16 19:21 review
pyc_mtime3.patch mark.dickinson, 2012-12-16 21:51 review
pyc_mtime4.patch mark.dickinson, 2012-12-21 20:10 review
Messages (15)
msg151969 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-01-25 19:28
(Marking this as 'Interpreter Core' because the issue really seems to come from Python/import.c rather than from the py_compile module.)

On a Windows 7 VM (64-bit, NTFS) running Python 2.7 (also reproduced on a non-VM Windows 7 installation), I'm seeing the following surprising behaviour:


C:\Python27\Lib>dir decimal.py*
 Volume in drive C is Local
 Volume Serial Number is 5083-D43D

 Directory of C:\Python27\Lib

06/10/2011  08:46 PM           219,383 decimal.py
01/25/2012  07:05 PM           165,322 decimal.pyc
01/25/2012  04:54 PM           169,386 decimal.pyo
               3 File(s)        554,091 bytes
               0 Dir(s)     966,266,880 bytes free

C:\Python27\Lib>python
Enthought Python Distribution -- www.enthought.com
Version: 7.2-2 (64-bit)

Python 2.7.2 |EPD 7.2-2 (64-bit)| (default, Sep 14 2011, 11:25:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "packages", "demo" or "enthought" for more information.
>>> import py_compile; py_compile.compile('decimal.py')
>>> import os; os.stat('decimal.pyc').st_mtime
1327518430.306176
>>> import decimal
>>> os.stat('decimal.pyc').st_mtime
1327518443.9094632

Notice that in spite of calling py_compile.compile on decimal.py, the .pyc file is still regenerated on import.
Relevant details: note that the decimal.py timestamp is in the summer, and that it's currently winter time.  Also, my Windows 7 timezone setting is UTC, with 'automatically adjust for DST' set to true.

This bit me today when working with a 'post-install' script for an MSI installer for a large Python application, where the post-install script did a compileall.compile_path to make sure that all the .pyc files were up to date, and thereby avoid a *long* (> 90 second) application first startup time.  It turned out that on application startup some of the .pyc files got regenerated while others didn't, and it took me far too long to notice that it was the .py files with summer timestamps that lead to .pyc regeneration, and that those with winter timestamps were okay.

When I set my timezone to plain UTC with no DST adjustments, the above issue disappears.  Also, when I reset my timezone to that of Saudi Arabia (no DST), the issue also disappears.


The cause of the issue seems to be that:

(1) import.c uses an 'fstat' call to get mtime for a .py file.
(2) on my Windows installation, the result of fstat on a file varies (a) with time of year, and (b) with computer DST and timezone settings.  (No such problems on OS X.)
(3) in contrast, py_compile uses os.stat(...).st_mtime, which appears to be invariant under time changes.


The difference between the 'fstat' result and the os.stat result can be seen directly:

Python 2.7.2 |EPD 7.2-2 (64-bit)| (default, Sep 14 2011, 11:25:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "packages", "demo" or "enthought" for more information.
>>> import decimal  # make sure .pyc file is up to date
>>> import os; os.stat('decimal.py').st_mtime  # actual mtime of .py file
1307738784.0
>>> import struct; struct.unpack('LL', open('decimal.pyc', 'rb').read(8))[1]  # stored mtime from .pyc; uses fstat
1307735184



This presumably also means that in this part of the world, .pyc files will be regenerated on import after any DST change, since the reported 'fstat' result used by import.c will no longer match the stored mtime timestamp in the .pyc file.
msg151972 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-01-25 20:07
Issue reproduced on stock Python 2.7 and Python 3.2 from www.python.org.  I don't have a working 3.3 install on Windows, but looking at the source, it seems likely that it's an issue there, too.
msg152001 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-01-26 10:00
Also seen on Windows Vista;  seems to be a general Windows + NTFS problem.

Changing title to make it clearer that this is a core language issue.

It seems as though the correct fix would be to use something like GetFileInformationByHandle in place of the fstat calls in import.c.
msg152009 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-26 14:15
Hmm, interesting. This is exactly what happened recently when debugging pyc timestamp issues under Windows:
http://www.python.org/dev/buildbot/all/builders/x86%20Windows7%202.7/builds/1204/steps/test/logs/stdio

Some decoding of the above crash:
- the test would set the .py file's timestamps to 2**33
- this is truncated (module 2**32) and therefore should become 0 in the .pyc file's embedded timestamp
- in reality, the .pyc file's embedded timestamps is equal to 4294963696. Which is 2**32 - 3600.

As a sidenote, we don't have any tests that the pyc file is re-used when it is fresh enough. Perhaps by running an interpreter in a subprocess with "-v" we could examine the verbose messages printed out in import.c.

> It seems as though the correct fix would be to use something like GetFileInformationByHandle in place of the fstat calls in import.c.

We must probably also replace the stat() call (through _Py_stat) with GetFileAttributesEx, or make _Py_stat re-use GetFileAttributesEx.
msg153878 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-02-21 16:44
For some reason that I'm unable to follow right now, fstat in the crt does this dance to obtain st_mtime:
- obtain time stamps using GetFileInformationByHandle
- convert ftLastWriteTime using FileTimeToLocalFileTime to LocalFTime
- convert LocalFTime using FileTimeToSystemTime to SystemTime
- convert SystemTime using __loctotime32_t to st_mtime

Now, SystemTime is broken-down local time, and loctotime32 converts that to "UTC" using an explicit formula taking the *current* time zone into account. This should work correctly, though, as FileTimeToLocalFileTime also takes into account the current time zone.

Anybody analysing this further might want to write a C program that passes explicit time values into this procedure (see Vc/crt/src/fstat.c and dtoxtime.c for the exact source code), to find out what step exactly goes wrong.
msg165313 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-12 15:42
It shouldn't affect 3.3 anymore (importlib using the os module rather than direct C calls to msvcrt). Could someone check?
msg171482 - (view) Author: Peter Russell (qwertyface) Date: 2012-09-28 14:49
I can confirm that the current equivalent to Mark's original test case works as expected on default.

I recommend closing this issue.
msg175487 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2012-11-13 06:31
actually still valid for previous to 3.3
msg177615 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-12-16 17:35
Here's a possible fix for Python 2.7.  Reviews from Windows experts would be very much appreciated;  I don't really know what I'm doing here.

I'm still trying to work out how to test this effectively.
msg177622 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-12-16 19:21
Updated patch, with a test.
msg177632 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-12-16 21:51
Updated patch to address comments from Serhiy on Rietveld:

 - use '& 0xffffffff' in the Python code that gets the mtime;  only the
   bottom 32 bits of the mtime are used.
 - simplify line that reads mtime from .pyc file.

Also improved an error check in the C code: compare with INVALID_HANDLE_VALUE instead of comparing with -1 directly.
msg177893 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-12-21 20:10
Whoops.  int.from_bytes doesn't exist in Python 2.7.
msg178014 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2012-12-23 20:01
Latest patch looks ok to me and the tests pass.
msg178071 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-12-24 16:33
New changeset 002d5f3b240d by Mark Dickinson in branch '2.7':
Issue #13863: fix incorrect .pyc timestamps on Windows / NTFS (apparently due to buggy fstat)
http://hg.python.org/cpython/rev/002d5f3b240d
msg178072 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-12-24 16:35
Fixed for 2.7.  I don't think this is worth fixing to 3.2, given that it's not an issue for 3.3.

Thanks Brian and Serhiy for reviewing!
History
Date User Action Args
2012-12-24 16:35:40mark.dickinsonsetstatus: open -> closed
resolution: fixed
messages: + msg178072
2012-12-24 16:33:48python-devsetnosy: + python-dev
messages: + msg178071
2012-12-23 20:01:16brian.curtinsetmessages: + msg178014
2012-12-21 20:10:52mark.dickinsonsetfiles: + pyc_mtime4.patch

messages: + msg177893
2012-12-20 09:18:06mark.dickinsonsetassignee: mark.dickinson
2012-12-16 21:51:06mark.dickinsonsetfiles: + pyc_mtime3.patch

messages: + msg177632
2012-12-16 19:21:42mark.dickinsonsetfiles: + pyc_mtime2.patch

messages: + msg177622
2012-12-16 17:35:30mark.dickinsonsetfiles: + pyc_mtime.patch
keywords: + patch
2012-12-16 17:35:16mark.dickinsonsetmessages: + msg177615
2012-11-13 06:31:51eric.snowsetstatus: closed -> open

nosy: + eric.snow
messages: + msg175487

resolution: works for me -> (no value)
2012-11-13 06:30:56eric.snowsetstatus: open -> closed
resolution: works for me
2012-09-28 14:49:41qwertyfacesetnosy: + qwertyface
messages: + msg171482
2012-07-12 15:42:12pitrousetnosy: + mackeith

messages: + msg165313
versions: - Python 3.3
2012-07-12 15:40:55pitroulinkissue15333 superseder
2012-02-21 16:44:40loewissetnosy: + loewis
messages: + msg153878
2012-01-26 14:17:03pitrousetnosy: + vstinner
2012-01-26 14:15:43pitrousetpriority: normal -> high

nosy: + tim.golden, brian.curtin, pitrou
messages: + msg152009

components: + Windows
stage: needs patch
2012-01-26 10:00:59mark.dickinsonsetmessages: + msg152001
title: Using py_compile does not prevent recompilation due to 'bad mtime'. -> import.c sometimes generates incorrect timestamps on Windows + NTFS
2012-01-25 20:07:57mark.dickinsonsetmessages: + msg151972
versions: + Python 3.2, Python 3.3
2012-01-25 19:28:01mark.dickinsoncreate