This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Change default filecmp.cmp shallow option
Type: behavior Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: ghackebeil, rhettinger
Priority: normal Keywords:

Created on 2016-06-27 01:57 by ghackebeil, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (4)
msg269336 - (view) Author: Gabriel Hackebeil (ghackebeil) Date: 2016-06-27 01:57
I would like to propose changing the default setting for the shallow compare option in filecmp.cmp to False (or providing access an exact comparison function that does not use various performance optimizations).

I think many users will turn to this module as a replacement for the “diff” command on Unix systems, and it is far too easy of a trap to fall into thinking a full comparison takes place when calling filecmp.cmp.

I agree that the shallow compare option is a useful feature, but I think it should be something to opt into as it is more of a performance optimization (the same applies to the caching behavior, but that is for another time).

I understand that the documentation explains the default behavior, but the reality is that many users probably do not understand the consequences of this setting (or simply did not pay close enough attention to the documentation), making it easy for people to use this module wrong and write code that does not work as advertised. I admit to falling into this trap myself.
msg269345 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-06-27 04:48
Changing an existing API is painful, especially so for cases like this that have existed for a very, very long time.  Such as a change would likely have a detrimental effect on long-standing code relying on the existing behavior.

The module itself is all about "comparing files efficiently" and most of its code is about bypassing direct file reads.  The original purpose of the module seems to be about providing shallow compares, so your suggested change goes directly against the grain of the module and its original intention of being "fast by default".  The docs are very clear about there being trade-offs between correctness and time.

If the module were just being released, you might have a good case (in general, the safest options should be the default); however, the time for this decision was a very long time ago.   This ship has already sailed.
msg269351 - (view) Author: Gabriel Hackebeil (ghackebeil) Date: 2016-06-27 05:24
All good points. Perhaps further emphasis on this in the documentation would be helpful to. As it stands, this module is a dangerous one for a naive user (like me) to stumble across.

Maybe introducing an “exact” or “slow" diff function to the module would help distinguish that behavior from the cmp function. One could then deprecate the shallow keyword for the cmp function.

Gabe

> On Jun 26, 2016, at 9:48 PM, Raymond Hettinger <report@bugs.python.org> wrote:
> 
> 
> Raymond Hettinger added the comment:
> 
> Changing an existing API is painful, especially so for cases like this that have existed for a very, very long time.  Such as a change would likely have a detrimental effect on long-standing code relying on the existing behavior.
> 
> The module itself is all about "comparing files efficiently" and most of its code is about bypassing direct file reads.  The original purpose of the module seems to be about providing shallow compares, so your suggested change goes directly against the grain of the module and its original intention of being "fast by default".  The docs are very clear about there being trade-offs between correctness and time.
> 
> If the module were just being released, you might have a good case (in general, the safest options should be the default); however, the time for this decision was a very long time ago.   This ship has already sailed.
> 
> ----------
> nosy: +rhettinger
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue27396>
> _______________________________________
msg269352 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-06-27 05:54
The docs are already pretty clear.  In general, there isn't a defense against a naive use of anything in the computing world.  We can write docs but can't make a person read them and realize that their pre-existing mental model is off base.

Sorry, but I'm going to close this. The module is doing what it was designed for and what it want documented to do.  It has mostly had a successful history and I don't see anything worth churning the API.  

I think you will just have to chalk this up to experience :-)
History
Date User Action Args
2022-04-11 14:58:33adminsetgithub: 71583
2016-06-27 05:54:33rhettingersetstatus: open -> closed
resolution: not a bug
messages: + msg269352
2016-06-27 05:24:47ghackebeilsetmessages: + msg269351
2016-06-27 04:48:16rhettingersetnosy: + rhettinger
messages: + msg269345
2016-06-27 01:57:36ghackebeilcreate