Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filecmp.cmp() incorrect results when previously compared file is modified within modification time resolution #62349

Closed
fbm mannequin opened this issue Jun 6, 2013 · 10 comments
Labels
easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@fbm
Copy link
Mannequin

fbm mannequin commented Jun 6, 2013

BPO 18149
Nosy @rhettinger, @ned-deily
Files
  • 18149.patch: Patch for issue 18149 - Adds clear_cache() method
  • 18149-2.patch: Patch for issue 18149 - Adds clear_cache method with updated Docs and tests
  • 18149-3.patch: Patch for issue 18149 - Updated to add version added directive to docs
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-06-14.22:22:19.457>
    created_at = <Date 2013-06-06.14:07:35.909>
    labels = ['easy', 'type-bug', 'library']
    title = 'filecmp.cmp() incorrect results when previously compared file is modified within modification time resolution'
    updated_at = <Date 2013-06-14.22:22:19.455>
    user = 'https://bugs.python.org/fbm'

    bugs.python.org fields:

    activity = <Date 2013-06-14.22:22:19.455>
    actor = 'ned.deily'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-06-14.22:22:19.457>
    closer = 'ned.deily'
    components = ['Library (Lib)']
    creation = <Date 2013-06-06.14:07:35.909>
    creator = 'fbm'
    dependencies = []
    files = ['30557', '30565', '30571']
    hgrepos = []
    issue_num = 18149
    keywords = ['patch', 'easy']
    message_count = 10.0
    messages = ['190715', '190774', '190790', '191026', '191048', '191051', '191060', '191067', '191162', '191163']
    nosy_count = 6.0
    nosy_names = ['rhettinger', 'nadeem.vawda', 'ned.deily', 'python-dev', 'fbm', 'melevittfl']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue18149'
    versions = ['Python 3.4']

    @fbm
    Copy link
    Mannequin Author

    fbm mannequin commented Jun 6, 2013

    Example:

      with open('file1', 'w') as f:
        f.write('a')
    
      with open('file2', 'w') as f:
        f.write('a')
        
      print filecmp.cmp('file1', 'file2', shallow=False) # true
    
      with open('file2', 'w') as f:
        f.write('b')

    print filecmp.cmp('file1', 'file2', shallow=False) # true

    Because of the caching, both calls to filecmp.cmp() return true on my system.

    When retrieving value from cache, the function filecmp.cmp() checks the signatures of the files:

      s1 = _sig(os.stat(f1))
      s2 = _sig(os.stat(f2))
      ...
      outcome = _cache.get((f1, f2, s1, s2))

    But the signatures in cache are the same, if the file sizes and times of modification (os.stat().st_mtime) haven't changed from the last call, even if the content has changed.

    The buffer is mentioned in the documentation, but there isn't any documented way to clear it. It also isn't nice IMO, that one has to worry about the file system's resolution of the file modification time when calling a simple file comparison.

    @fbm fbm mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jun 6, 2013
    @ned-deily
    Copy link
    Member

    It seems like this would be a fairly rare situation and, as you note, dependent on the underlying file system. But it would be easy to add a new function to the module to clear its cache in cases where it is known this might be a problem. In fact, in bpo-11802 a clear_cache function was proposed to solve the problem of the cache growing without bounds but that problem was solved by the simpler solution of discarding the cache when it gets above 100 entries.

    @ned-deily ned-deily added the easy label Jun 7, 2013
    @ned-deily ned-deily changed the title filecmp.cmp() - cache invalidation fails when file modification times haven't changed filecmp.cmp() incorrect results when previously compared file is modified within modification time resolution Jun 7, 2013
    @rhettinger
    Copy link
    Contributor

    +1 for a cache clearing function like the one in re.py

    @melevittfl
    Copy link
    Mannequin

    melevittfl mannequin commented Jun 12, 2013

    I've added a "clear_cache()" method to filecmp.py. Patch attached.

    I had thought about implementing an optional parameter to only invalidate the cache of a specific file object, but figured I'd keep it simple for now.

    First time submitting a patch, so apologies if I've done something the wrong way.

    @ned-deily
    Copy link
    Member

    Thanks for the patch, Mark. I've left some review comments via Rietveld (the review link next to the patch). Also, if you haven't already, please fill out the contributor form as described in the Developer's Guide (http://docs.python.org/devguide/patch.html#licensing).

    @melevittfl
    Copy link
    Mannequin

    melevittfl mannequin commented Jun 12, 2013

    Ned,

    Thanks for taking the time to review. I've updated the docs, added a unit test, signed the contributor form, and made the changes/corrections from your review.

    Updated patch attached.

    @ned-deily
    Copy link
    Member

    Looks good to me, other than that the doc change should include a version added directive (which can be added by the committer):

    .. function:: clear_cache()

    + .. versionadded:: 3.4
    +
    Clear the filecmp cache. This may be useful if a file is compared so quickly

    @melevittfl
    Copy link
    Mannequin

    melevittfl mannequin commented Jun 13, 2013

    Cool. I've gone ahead and generated a new patch with the version added directive included.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 14, 2013

    New changeset bfd53dcb02ff by Ned Deily in branch 'default':
    Issue bpo-18149: Add filecmp.clear_cache() to manually clear the filecmp cache.
    http://hg.python.org/cpython/rev/bfd53dcb02ff

    @ned-deily
    Copy link
    Member

    Committed for release in 3.4.0. Thanks, Mark.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants