classification
Title: newer() function in dep_util.py mixes up new vs. old files due stat.st_mtime vs stat
Type: behavior Stage: resolved
Components: Distutils Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: eric.araujo Nosy List: eric.araujo, jsjgruber, python-dev, tarek
Priority: normal Keywords: patch

Created on 2011-04-27 04:29 by jsjgruber, last changed 2011-08-02 19:36 by eric.araujo. This issue is now closed.

Files
File name Uploaded Description Edit
setuptest jsjgruber, 2011-04-27 15:09
mtime.py jsjgruber, 2011-04-27 19:47
fix-newer-mtime.diff eric.araujo, 2011-06-03 16:39
Messages (14)
msg134526 - (view) Author: John S. Gruber (jsjgruber) Date: 2011-04-27 04:29
In researching a bug I was surprised that a newly created file was being replaced when being processed a second time (it shouldn't have been processed a second time). 

I tracked the surprise to diff Lib/distutils/dep_util.py @ 57642:9211a5d7d0b4 where a comparison with a resolution of 1 second was replaced by a floating point comparison, though the new file was copied by file_util.py which tried to preserve the time using a method accurate to only 1 second (truncating the fraction). Since a new file with a truncated modification time looks older than an older file with a full precision stamp the problem resulted.

(For convenience--stat.st_mtime is floating point, stat[ST_MTIME] is integer seconds unless os.stat_float_times has been called to change the floating point behavior.

Using all floating point doesn't resolve the issue as OS timestamped files can have more resolution than python floating point may hold, again causing truncation and confusion. For reference see Python issue 10148.

In my humble opinion time setting and comparison should all be done consistently, and, if sub-second comparisons are desired, some fuzz should be used such that the comparison makes sense for the platform with the least amount of precision available with its floating point implementation.

I briefly looked at branches other than 2.7 and they don't seem to have the above change.

Probably of minor importance in most cases.

Thanks all for the good work, all. I've been learning python and I love it!
msg134536 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-04-27 08:39
Hi!  Welcome to the joy of Python and thanks for the bug report and analysis.

> In researching a bug
Is it public?  I’d be curious to look at it.

> I was surprised that a newly created file was being replaced when
> being processed a second time
What created the file?  What king of “processing” happened after?

> I tracked the surprise to diff Lib/distutils/dep_util.py @ 57642:9211a5d7d0b4
> where a comparison with a resolution of 1 second was replaced by a
> floating point comparison
Indeed.  The whole distutils codebase should use ST_MTIME or st_mtime consistently.

> Using all floating point doesn't resolve the issue as OS timestamped
> files can have more resolution than python floating point may hold,
If I understand this bug correctly, it’s a platform-specific limitation.  We can try changing all distutils files to use st_mtime and see if tests break.  Would you like to work on a additional test case to reproduce your bug?
msg134576 - (view) Author: John S. Gruber (jsjgruber) Date: 2011-04-27 15:09
Thanks for considering my report so quickly. Attached is a simple test to reproduce the bug, as you suggested. 

Please note that I am not suggesting the code base use stat.st_mtime.

Running the attached with ext4, which keeps sub-second file timestamps:

gruber@gruber-Satellite-L355D:~$ ~/setuptest
running build
running build_py
creating build
creating build/lib.linux-i686-2.7
copying charley.py -> build/lib.linux-i686-2.7
running build
running build_py
copying charley.py -> build/lib.linux-i686-2.7

Under an ext3 filesystem which only accounts to the nearest second:

gruber@gruber-Satellite-L355D:~$ pushd /media/EXT3/
/media/EXT3 ~
gruber@gruber-Satellite-L355D:/media/EXT3$ ~/setuptest
running build
running build_py
copying charley.py -> build/lib.linux-i686-2.7
running build
running build_py

You can see under ext3 the newer() function of dep_util.py works correctly and the redundant copy of charley.py to the build directory, when setup.py is run a second time, is skipped. 

Above, with ext4, the mtime comparison was confused.

As I said in the report, not a big problem--but it obviously isn't what was intended so I hoped a problem report would be worthwhile to the distutils developers.

Thanks.
msg134579 - (view) Author: John S. Gruber (jsjgruber) Date: 2011-04-27 15:25
The original bug report is at: https://bugs.launchpad.net/ubuntu/+source/python-distutils-extra/+bug/770566

As you can see it had to do with a symbolic link created by distutils-extra before distutils was called upon to copy anything. Since this was done behind distutils back, LP: 770566 really was a distutils-extra problem and Martin's fix means this issue no longer arises there.

To recreate the problem in that environment distutils-extra had to be modified to coerce processing to happen in a particular order so setup.py would complete, and then results compared between an ext4 run and the build farm's presumably ext3 run. In the ext4 case the symbolic link was predictably replaced by the file itself. 

The test case I posted above should be easier to follow and doesn't involve any smoke and mirrors being done behind distutils back!

Please let me know if I provide any further information.
msg134581 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-04-27 15:31
> Please note that I am not suggesting the code base use stat.st_mtime.
Yep, I did :)  Do you have another idea for a fix?  We have to walk on eggshells with the distutils codebase: it has often happened that changes made to fix bugs were causing problems in third-party code relying on undocumented behavior (and/or bugs), so we want to fix bugs without causing undesired effects.  (In the next-generation version, distutils2, we can break compatibility for the better.)

I think using only ST_MTIME or only st_mtime could prevent the bug you’re reporting without doing harm otherwise.

> Running the attached with ext4, which keeps sub-second file timestamps:
I think filesystems with this precision are becoming widespread.  Python 3 doesn’t have the bug AFAICT but Python 2.7 will be used for years, so it’s a worthwhile bug to fix.

> You can see under ext3 the newer() function of dep_util.py works
> correctly and the redundant copy of charley.py to the build
> directory, when setup.py is run a second time, is skipped. 
This would be particularly annoying with larger projects involving C extensions.  Would you be willing to turn your test file to a patch, and then fix the code?  We have some basic unit tests in test_dep_util.py and test_build_py.py, you could see how they work and add a test for this bug.  If you’re inclined to do so, please read the developpers guide at http://docs.python.org/devguide .  It not, thanks for what you’ve already done, and I’ll take up from here.
msg134603 - (view) Author: John S. Gruber (jsjgruber) Date: 2011-04-27 17:36
I can understand what you are saying about side effects.

I was trying to suggest that the move to stat.st_mtime in dep_util.py in the hg commit I mentioned be reverted back to use stat[ST_MTIME], thereby matching the other python releases and the old behavior (and matching file_util.py). It seems to me that would be the safest course when it comes to side-effects.

If it were desired to determine which file was newer using sub-second values, perhaps that would make a reasonable change in distutils2, but files created with a few microseconds would have to be considered equivalent due to the reduced precision available in python floats (53 bits on my platform, if I understand correctly) as compared to the 64 bit precision used by some operating systems and file systems.

However, I think I'd prefer to turn the decision and further process over to you, if you don't mind. I thank you and your colleagues for your excellent work with python, and with making it platform independent--a very difficult part of the work indeed.
msg134610 - (view) Author: John S. Gruber (jsjgruber) Date: 2011-04-27 19:47
As I thought about it, I suppose I should demonstrate the problem with stat.st_mtime. Here's an example and its output on an ext4 file system:

gruber@gruber-Satellite-L355D:~$ ./mtime.py
(1303933305.5525582, 1303933305.5525582) (1303933305.552558, 1303933305.552558) (1303933305.552557, 1303933305.552557) (1303933305.552556, 1303933305.552556)
(1303933305.5525581837 1303933305.5525581837) (1303933305.5525579453 1303933305.5525579453) (1303933305.5525569916 1303933305.5525569916) (1303933305.5525560379 1303933305.5525560379)
total 0
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:45.552558258 -0400 one
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:45.552558000 -0400 target1
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:45.552557000 -0400 target2
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:45.552556000 -0400 target3

Traceback (most recent call last):
  File "./mtime.py", line 28, in <module>
    assert timetargetthree == timetargettwo
AssertionError
gruber@gruber-Satellite-L355D:~$ ./mtime.py
(1303933306.3885624, 1303933306.3885624) (1303933306.388562, 1303933306.388562) (1303933306.388561, 1303933306.388561) (1303933306.388561, 1303933306.388561)
(1303933306.3885624409 1303933306.3885624409) (1303933306.3885619640 1303933306.3885619640) (1303933306.3885610104 1303933306.3885610104) (1303933306.3885610104 1303933306.3885610104)
total 0
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:46.388562342 -0400 one
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:46.388562000 -0400 target1
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:46.388561000 -0400 target2
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:46.388561000 -0400 target3

Traceback (most recent call last):
  File "./mtime.py", line 29, in <module>
    assert timetargettwo == timetargetone
AssertionError
gruber@gruber-Satellite-L355D:~$ ./mtime.py
(1303933307.0525656, 1303933307.0525656) (1303933307.052565, 1303933307.052565) (1303933307.052565, 1303933307.052565) (1303933307.052565, 1303933307.052565)
(1303933307.0525655746 1303933307.0525655746) (1303933307.0525650978 1303933307.0525650978) (1303933307.0525650978 1303933307.0525650978) (1303933307.0525650978 1303933307.0525650978)
total 0
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:47.052565592 -0400 one
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:47.052565000 -0400 target1
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:47.052565000 -0400 target2
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:47.052565000 -0400 target3

Traceback (most recent call last):
  File "./mtime.py", line 30, in <module>
    assert timetargetone == timeone
AssertionError
gruber@gruber-Satellite-L355D:~$ ./mtime.py
(1303933308.2285714, 1303933308.2285714) (1303933308.228571, 1303933308.228571) (1303933308.22857, 1303933308.22857) (1303933308.228569, 1303933308.228569)
(1303933308.2285714149 1303933308.2285714149) (1303933308.2285709381 1303933308.2285709381) (1303933308.2285699844 1303933308.2285699844) (1303933308.2285690308 1303933308.2285690308)
total 0
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:48.228571344 -0400 one
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:48.228571000 -0400 target1
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:48.228570000 -0400 target2
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:48.228569000 -0400 target3

Traceback (most recent call last):
  File "./mtime.py", line 28, in <module>
    assert timetargetthree == timetargettwo
AssertionError
gruber@gruber-Satellite-L355D:~$ ./mtime.py
(1303933309.0685754, 1303933309.0685754) (1303933309.068575, 1303933309.068575) (1303933309.068574, 1303933309.068574) (1303933309.068573, 1303933309.068573)
(1303933309.0685753822 1303933309.0685753822) (1303933309.0685749054 1303933309.0685749054) (1303933309.0685739517 1303933309.0685739517) (1303933309.0685729980 1303933309.0685729980)
total 0
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:49.068575446 -0400 one
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:49.068575000 -0400 target1
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:49.068574000 -0400 target2
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:49.068573000 -0400 target3

Traceback (most recent call last):
  File "./mtime.py", line 28, in <module>
    assert timetargetthree == timetargettwo
AssertionError
gruber@gruber-Satellite-L355D:~$ ./mtime.py
(1303933309.6765785, 1303933309.6765785) (1303933309.676578, 1303933309.676578) (1303933309.676578, 1303933309.676578) (1303933309.676578, 1303933309.676578)
(1303933309.6765785217 1303933309.6765785217) (1303933309.6765780449 1303933309.6765780449) (1303933309.6765780449 1303933309.6765780449) (1303933309.6765780449 1303933309.6765780449)
total 0
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:49.676578416 -0400 one
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:49.676578000 -0400 target1
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:49.676578000 -0400 target2
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:49.676578000 -0400 target3

Traceback (most recent call last):
  File "./mtime.py", line 30, in <module>
    assert timetargetone == timeone
AssertionError
gruber@gruber-Satellite-L355D:~$ ./mtime.py
(1303933310.396582, 1303933310.396582) (1303933310.396581, 1303933310.396581) (1303933310.39658, 1303933310.39658) (1303933310.396579, 1303933310.396579)
(1303933310.3965818882 1303933310.3965818882) (1303933310.3965809345 1303933310.3965809345) (1303933310.3965799809 1303933310.3965799809) (1303933310.3965790272 1303933310.3965790272)
total 0
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:50.396581938 -0400 one
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:50.396581000 -0400 target1
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:50.396580000 -0400 target2
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:50.396579000 -0400 target3

Traceback (most recent call last):
  File "./mtime.py", line 28, in <module>
    assert timetargetthree == timetargettwo
AssertionError
gruber@gruber-Satellite-L355D:~$ ./mtime.py
(1303933311.056585, 1303933311.056585) (1303933311.056585, 1303933311.056585) (1303933311.056585, 1303933311.056585) (1303933311.056585, 1303933311.056585)
(1303933311.0565850735 1303933311.0565850735) (1303933311.0565850735 1303933311.0565850735) (1303933311.0565850735 1303933311.0565850735) (1303933311.0565850735 1303933311.0565850735)
total 0
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:51.056585165 -0400 one
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:51.056585000 -0400 target1
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:51.056585000 -0400 target2
-rw-r--r-- 1 gruber gruber 0 2011-04-27 15:41:51.056585000 -0400 target3

gruber@gruber-Satellite-L355D:~$

Please excuse my total lack of Python style.
msg134634 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-04-28 01:39
> I was trying to suggest that the move to stat.st_mtime in dep_util.py
> in the hg commit I mentioned be reverted back to use stat[ST_MTIME],
> thereby matching the other python releases and the old behavior (and
> matching file_util.py). It seems to me that would be the safest
> course when it comes to side-effects.
Agreed.

> If it were desired to determine which file was newer using sub-second
> values, perhaps that would make a reasonable change in distutils2,
> but files created with a few microseconds would have to be considered
> equivalent due to the reduced precision available in python floats
> (53 bits on my platform, if I understand correctly) as compared to
> the 64 bit precision used by some operating systems and file systems.
Interesting.  Can you open another bug, using type “feature request” and component “distutils2” so that we discuss that?  Alternatively, send an email to the-fellowship-of-the-packaging Google group (more people (including distutils wizards) follow the group than the bug tracker).

> However, I think I'd prefer to turn the decision and further process
> over to you, if you don't mind.
No problem.  Thanks to your patch and diagnosis, it should be easy to write a test and fix.  I hope you’ll be available to run the test on your machine.

> I thank you and your colleagues for your excellent work with python,
Thank you for doing your part as a user!  (It’s weird to see “colleagues” used with something that’s not a job.  We do this for fun, you know :)

> and with making it platform independent--a very difficult part of the
> work indeed.
Heh, the more I learn about Python, the more I hate hardware and operating systems :)
msg135394 - (view) Author: John S. Gruber (jsjgruber) Date: 2011-05-07 01:01
As Éric suggested I opened issue 11993 a couple of days ago.
msg137552 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-06-03 16:39
Can you test this patch?
msg138046 - (view) Author: John S. Gruber (jsjgruber) Date: 2011-06-10 04:19
The patch worked--it eliminated the redundant copy caused by this issue.

Thank you very much, Éric.

On Fri, Jun 3, 2011 at 12:39 PM, Éric Araujo <report@bugs.python.org> wrote:
>
> Éric Araujo <merwok@netwok.org> added the comment:
>
> Can you test this patch?

> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue11933>
> _______________________________________
>
msg138931 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-06-24 12:15
Thanks, I’ll commit shortly.
msg141574 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-08-02 18:02
New changeset aebe3243bb2c by Éric Araujo in branch '2.7':
Fix incorrect mtime comparison in distutils (#11933).
http://hg.python.org/cpython/rev/aebe3243bb2c
msg141579 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-08-02 19:36
Thanks again!
History
Date User Action Args
2011-08-02 19:36:13eric.araujosetstatus: open -> closed
resolution: fixed
messages: + msg141579

stage: needs patch -> resolved
2011-08-02 18:02:43python-devsetnosy: + python-dev
messages: + msg141574
2011-06-24 12:15:39eric.araujosetassignee: tarek -> eric.araujo
messages: + msg138931
2011-06-10 04:19:37jsjgrubersetmessages: + msg138046
title: newer() function in dep_util.py mixes up new vs. old files due stat.st_mtime vs stat[ST_MTIME] -> newer() function in dep_util.py mixes up new vs. old files due stat.st_mtime vs stat
2011-06-03 16:39:16eric.araujosetfiles: + fix-newer-mtime.diff
keywords: + patch
messages: + msg137552
2011-05-07 01:01:58jsjgrubersetmessages: + msg135394
2011-04-28 01:39:58eric.araujosetmessages: + msg134634
2011-04-27 19:47:47jsjgrubersetfiles: + mtime.py

messages: + msg134610
2011-04-27 17:36:06jsjgrubersetmessages: + msg134603
2011-04-27 15:31:24eric.araujosetmessages: + msg134581
stage: needs patch
2011-04-27 15:25:28jsjgrubersetmessages: + msg134579
2011-04-27 15:09:53jsjgrubersetfiles: + setuptest

messages: + msg134576
2011-04-27 08:39:21eric.araujosetmessages: + msg134536
2011-04-27 04:29:50jsjgrubercreate