classification
Title: filecmp.cmp(shallow=True) isn't actually shallow when only mtime differs
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: AlexVndnblcke, andrei.avk, chanke, docs@python, josh.r, lukasz.langa, miss-islington
Priority: normal Keywords: patch

Created on 2021-01-18 20:24 by AlexVndnblcke, last changed 2021-08-04 20:11 by lukasz.langa. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 24246 closed AlexVndnblcke, 2021-01-18 20:31
PR 27166 merged andrei.avk, 2021-07-15 15:21
PR 27607 merged miss-islington, 2021-08-04 19:39
PR 27608 merged miss-islington, 2021-08-04 19:39
Messages (14)
msg385225 - (view) Author: Alexander Vandenbulcke (AlexVndnblcke) * Date: 2021-01-18 20:24
Consider the case where 2 files are shallow compared where only the mtime differs (i.e. the mode and size is identical).

With filecmp.cmp(f1, f2, shallow=True) a deep compare would be performed behind the scenes since the guard clauses fell through. This discrepancy is either a problem in the docstring or a problem in the comparison itself.

Two options remain:
- the documentation is altered and describes that, in case only the mtime differs (i.e. mode and size are equal) a deep compare is performed
- the behaviour is restricted in that effectively only a shallow compare is performed
- a shallow compare becomes more restrictive (do not consider the mtime during the compare)

My preference is to adjust the function to safeguard the meaning of 'shallow' in this context.
msg385304 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2021-01-19 22:40
This is a problem with the docstring. The actual docs for it are a bit more clear, https://docs.python.org/3/library/filecmp.html#filecmp.cmp :

"If shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared."

Your patch can't be used because it changes longstanding documented behavior. If you'd like to submit a patch to fix the docstring, that's fine, but we're not going to break existing code to make the function less accurate.

The patch should probably make the documentation more clear while it's at it.

1. The original wording could be misinterpreted as having the "Otherwise" apply to shallow=False only, not to the case where shallow=T rue but os.stat doesn't match.
2. The existing wording isn't clear on what an os.stat() "signature" is, which can leave the impression that the entirety of os.stat is compared (which would only match for hardlinks of the same file), when in fact it's just the file type (stat.S_IFMT(st.st_mode), file vs. directory vs. symlink, etc.), size and mtime.

Proposed rewording of main docs would be:

"If shallow is true, files with identical os.stat() signatures (file type, size, and modification time) are taken to be equal. When shallow is false, or the file signatures are identical, the contents of the files are compared."

A similar wording (or at least, a shorter version of the above, rather than a strictly wrong description of the shallow parameter) could be applied to the docstring.
msg397320 - (view) Author: Christof Hanke (chanke) * Date: 2021-07-12 15:05
Hi,

this is also discussed in https://bugs.python.org/issue41354.

Ciao,
  Christof
msg397340 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-07-12 17:35
Christof and Alexander, as you both said your preference is to have some way of enforcing "only shallow" compare, I want to clarify -- it seems if you do that, you will get the answer of True if the files are most likely the same (sig is equal), but if the sig is not equal, I'm not sure what answer you expect or why. The sig may be different because of mtime, and then the files may be different or the same, it's anyone's guess.

I wonder if both of you expect the same behavior, and if so, for the same use case or not?
msg397356 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-07-12 21:04
Also see this 16 years old issue: https://bugs.python.org/issue1234674
msg397363 - (view) Author: Christof Hanke (chanke) * Date: 2021-07-12 22:30
Hi Andrei,

I would follow rsync. 
From the man page:
"""
[...]
 -c, --checksum
              This changes the way rsync checks if the files have been changed and are in need of a transfer.   Without  this  option,  rsync
              uses  a  "quick  check" that (by default) checks if each file’s size and time of last modification match between the sender and
              receiver. 
[...]
"""

so, yes you can have false positives with a shallow comparison of size + mtime only. But that's usually ok for e.g. incremental backups. 


Wow, the bug is that old...
msg397364 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-07-12 22:49
But rsync is a quite more specific tool. For example, unix cmp command does not guess based on any part of `stat` sig. That's a much closer command in scope to 'filecmp'.
msg397381 - (view) Author: Christof Hanke (chanke) * Date: 2021-07-13 05:44
Andrei,
cmp is the deep-compare part of filecmp. I thought we were taking about the shallow one.

Thus,
- shallow like rsync "quick": size + mtime.
- deep like cmp
msg397560 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-07-15 15:37
I've put up the doc update PR here:
https://github.com/python/cpython/pull/27166

I've also reviewed a few dozen SO questions about filecmp.cmp and shallow arg, many of them find the docs confusing or lacking, so I thought updating docs is definitely very useful.

I haven't seen any of them asking for "force shallow" behavior, but there's many of them and I haven't read all; searching for "force shallow" doesn't find any.

I think it might be useful to also add a new 'force shallow' arg, but I hope more people ask for it or compile a list of SO questions asking for it, otherwise it might not be worth additional complexity.
msg397561 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-07-15 15:45
Alexander: sorry, I didn't notice your PR because I was first working on a different issue, and then found this as a duplicate, and only noticed there's already an open PR here. If you prefer, we can close my PR and work on updating yours.
msg398944 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-08-04 19:39
New changeset a8dc4893d2b28827e82447326ea47759c161a722 by andrei kulakov in branch 'main':
bpo-42958: Improve description of shallow= in filecmp.cmp docs (GH-27166)
https://github.com/python/cpython/commit/a8dc4893d2b28827e82447326ea47759c161a722
msg398949 - (view) Author: miss-islington (miss-islington) Date: 2021-08-04 20:03
New changeset c2593b4d06712cefcdeae93b32f88faa4772bc3a by Miss Islington (bot) in branch '3.10':
bpo-42958: Improve description of shallow= in filecmp.cmp docs (GH-27166)
https://github.com/python/cpython/commit/c2593b4d06712cefcdeae93b32f88faa4772bc3a
msg398950 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-08-04 20:09
New changeset 7dad0337518a0d599caf8f803a5bf45db67cbe9b by Miss Islington (bot) in branch '3.9':
bpo-42958: Improve description of shallow= in filecmp.cmp docs (GH-27166) (GH-27608)
https://github.com/python/cpython/commit/7dad0337518a0d599caf8f803a5bf45db67cbe9b
msg398951 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-08-04 20:11
Thanks all for your effort on improving this! ✨ 🍰 ✨
History
Date User Action Args
2021-08-16 23:30:26ned.deilylinkissue33896 superseder
2021-08-04 20:11:39lukasz.langasetstatus: open -> closed

assignee: docs@python
components: + Documentation, - Library (Lib)
versions: + Python 3.9, Python 3.10, Python 3.11
nosy: + docs@python

messages: + msg398951
resolution: fixed
stage: patch review -> resolved
2021-08-04 20:09:59lukasz.langasetmessages: + msg398950
2021-08-04 20:03:40miss-islingtonsetmessages: + msg398949
2021-08-04 19:39:58miss-islingtonsetpull_requests: + pull_request26103
2021-08-04 19:39:54miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request26102
2021-08-04 19:39:54lukasz.langasetnosy: + lukasz.langa
messages: + msg398944
2021-08-04 18:51:38lukasz.langalinkissue41354 superseder
2021-07-15 15:45:58andrei.avksetmessages: + msg397561
2021-07-15 15:37:24andrei.avksetmessages: + msg397560
2021-07-15 15:21:52andrei.avksetpull_requests: + pull_request25702
2021-07-13 05:44:46chankesetmessages: + msg397381
2021-07-12 22:49:21andrei.avksetmessages: + msg397364
2021-07-12 22:30:15chankesetmessages: + msg397363
2021-07-12 21:04:32andrei.avksetmessages: + msg397356
2021-07-12 17:35:27andrei.avksetnosy: + andrei.avk
messages: + msg397340
2021-07-12 15:05:08chankesetnosy: + chanke
messages: + msg397320
2021-01-19 22:40:57josh.rsetnosy: + josh.r
messages: + msg385304
2021-01-18 20:31:56AlexVndnblckesetkeywords: + patch
stage: patch review
pull_requests: + pull_request23067
2021-01-18 20:24:48AlexVndnblckecreate