This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: difflib lacks a way to check if results are empty
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: tim.peters Nosy List: rhettinger, simon_, tim.peters
Priority: normal Keywords:

Created on 2019-11-13 15:59 by simon_, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (6)
msg356536 - (view) Author: Simon Friedberger (simon_) Date: 2019-11-13 15:59
It seems there is no easy way to use difflib to show a diff but only when there actually are differences. The SequenceMatcher has ratio() but that isn't really available through Differ or any of the convenience functions. Vice versa, when using SequenceMatcher the pretty display is not available.
msg356624 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2019-11-14 18:39
Please be explicit:  exactly which functions are you talking about, and exactly what do you want them to do instead.  Since, best I can tell, this is the first complaint of its kind, it's a pretty safe bet people can't guess what you want ;-)

Note that, e.g., Differ(...).compare(...) returns a generator-iterator.  There is no general way in Python to know whether a generator will yield a non-empty sequence of results without running the generator.  This is common to all generators, not unique to those difflib returns.

So, of course, the same kinds of idioms can be used as for any other generator.  For example:

foundone = False
for line in difflib.Differ(...).compare(...):
    if not foundone:
        # there is at least one result, and this is the first
        # maybe print a header line here, or whatever
        foundone = True
    process(line)
if not foundone:
    # the generator produced nothing

Simpler to code is to force the results into a list instead, but then you lose the possible memory-saving advantages of iterating over a generator:

    result = list(difflib.Differ(...).compare(...))
    if result:
        # there are results to deal with
    else:
        # the generator produced nothing
msg356850 - (view) Author: Simon Friedberger (simon_) Date: 2019-11-18 08:30
Hi Tim!

Sorry, if my explanation wasn't clear.

For some of the iterators - like the one produced by ndiff - the iterator will always return data, even if there is no difference in the files.

My current solution is to run difflib.unified_diff and check if the iterator is non-empty and then run difflib.ndiff again to get the output that I want.
msg356851 - (view) Author: Simon Friedberger (simon_) Date: 2019-11-18 08:32
And, just to state this explicitly, I think you are right that there are general idioms for checking if a generator can produce an item but I think it would be nicer if iterators which could do this is in a cheap way (like in this case) would allow it explicitly.

I don't know, maybe I'm wrong. Just seems nice. :)
msg356911 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-11-18 21:17
Simon, I think the conversation is starting to drift and would best be continued on python-ideas or StackOverflow.  Ideas like peekable generators have been discussed before but there was almost no uptake.

Several thoughts:

* For equal inputs, ndiff() is supposed to generate non-empty output.  It does not just give differences.

* To the extent that you care about empty results from some other iterator, the easiest thing to do is follow Tim's advice and just list() the iterator.

* The special case of equal inputs is easily handled before running the diff:

     if a == b:
         do_something_for_the_equal_case(a, b)
     else:
         d = ndiff(a, b)
         do_something_for_the_non_equal_case(a, b, d)

I recommend closing this issue because it hasn't elicted anything that is both actionable and desireable.
msg356942 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2019-11-19 03:33
I'm taking Raymond's advice to close this for now.  The issue tracker isn't the right place to work out ideas - python-ideas is far better for that (StackOverflow isn't a good place for that either - StackOverflow is best for when you have a very specific use case and get stuck).

While the issues with generators are common to all generators, in the context of difflib something else Raymond said should be taken to heart:  using any difflib facility is an extremely expensive way to find out that two things are equal.  That's a value of use cases!  That is, if you had asked about this on StackOverflow and asked for help instead of proposing "a (vague) solution", they would have told you at once to check whether `a == b` before dragging difflib into it.

Indeed, that's probably why what you're asking about never came up before.  People generally use difflib only when they know in advance (via a cheap equality test) that there _are_ differences to be found.

In any case, if a "specific & actionable" suggestion comes out of pursuing this, feel encouraged to open this report again!
History
Date User Action Args
2022-04-11 14:59:23adminsetgithub: 82970
2019-11-19 03:33:21tim.peterssetstatus: open -> closed
resolution: rejected
messages: + msg356942

stage: resolved
2019-11-18 21:17:46rhettingersetnosy: + rhettinger
messages: + msg356911
2019-11-18 08:32:50simon_setmessages: + msg356851
2019-11-18 08:30:12simon_setmessages: + msg356850
2019-11-14 18:39:58tim.peterssetmessages: + msg356624
2019-11-14 06:37:53rhettingersetassignee: tim.peters

nosy: + tim.peters
versions: + Python 3.9, - Python 3.7
2019-11-13 15:59:24simon_settype: enhancement
2019-11-13 15:59:15simon_create