This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Steven.Barker
Recipients Steven.Barker, chris.jerdonek, fdrake, georg.brandl, goatsofmendez, pvrijlandt, rhettinger, sandro.tosi
Date 2014-05-06.03:52:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1399348353.21.0.464826829196.issue1234674@psf.upfronthosting.co.za>
In-reply-to
Content
Here's a patch against the default branch that fixes filecmp.cmp's behavior when "shallow" is True, including an update to the module's docs (correcting the previous ambiguity discussed in the 2011 python-dev thread mentioned by Sandro Tosi) and a couple of new tests of files where shallow comparisons should be expected get the answer wrong.

The changed behavior is very simple. The lines from Lib/filecmp.py in the original code:

    if shallow and s1 == s2:
        return True

are changed to:

    if shallow:
        return s1 == s2

This presumes answers to the questions asked by Georg Brandl back in 2005 of "No, shallow comparisons never read the file contents" and "No, non-shallow comparisons don't care about mtimes (except for avoiding out of date cache entries)".

If we only applied the test changes (not the behavior change) from my patch, one of the new tests would fail, as the current filecmp.cmp code never gives false negatives on comparisons in shallow mode (only false positives).

We probably don't want to commit the patch exactly as it stands, because the changed behavior causes several of the previously existing tests to fail. Almost all the dircmp tests fail for me, and the "test_matching" filecmp.cmp test does so intermittently (due to my system sometimes being fast enough at creating the files to beat my filesystem's mtime resolution).

The test failures are all because of shallow comparisons between files with the same contents, but (slightly) different mtimes. The new shallow comparison behavior is to say those files are unequal while the old code would fall back on testing the file contents despite the shallow parameter, and so it would return True. The failing tests can probably be updated to either explicitly set the file mtimes with sys.utime (as I do with the files used in the new tests), or we could rewrite some of the setup code to use something like shutil.copy2 to make copies of files while preserving their mtimes.

I'm not very familiar with the best practices when it comes to writing unit tests, so I pretty much copied the style of the existing tests in Lib/test/test_filecmp.py (though I've split mine up a bit more).

I understand from reading other filecmp bug reports that that test style is considered to be pretty poor, so there's probably room to improve my code as well. I thought I'd post this patch before making an effort at fixing the older tests, in order to get some feedback. I'll be happy to incorporate any suggested improvements!

This issue has been open for almost 9 years. Hopefully my patch can help get it moving again towards being fixed!
History
Date User Action Args
2014-05-06 03:52:33Steven.Barkersetrecipients: + Steven.Barker, fdrake, georg.brandl, rhettinger, pvrijlandt, goatsofmendez, sandro.tosi, chris.jerdonek
2014-05-06 03:52:33Steven.Barkersetmessageid: <1399348353.21.0.464826829196.issue1234674@psf.upfronthosting.co.za>
2014-05-06 03:52:33Steven.Barkerlinkissue1234674 messages
2014-05-06 03:52:30Steven.Barkercreate