classification
Title: Difflib should provide the option of overriding the SequenceMatcher
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: tim.peters Nosy List: offby1, r.david.murray, rhettinger, terry.reedy, tim.peters
Priority: normal Keywords: patch

Created on 2014-02-23 22:16 by offby1, last changed 2014-09-10 07:01 by Claudiu.Popa.

Files
File name Uploaded Description Edit
difflib-sm.patch offby1, 2014-02-23 22:16 patch for difflib review
difflib-sm.patch offby1, 2014-02-23 22:34 Updated patch with docs review
difflib-sm.patch offby1, 2014-04-15 13:38
difflib-sm.patch offby1, 2014-04-15 14:07
difflib-sm.patch offby1, 2014-04-15 14:15 review
Messages (13)
msg212036 - (view) Author: Chris Rose (offby1) * Date: 2014-02-23 22:16
It should be possible to provide subclasses or implementations of the SequenceMatcher interface to difflib's various methods that do formatting (for example, unified_diff/context_diff et al).

I've included a patch for it that satisfies my personal itch on the subject (attached). I'd like to have this reviewed for inclusion in 3.5.
msg212037 - (view) Author: Chris Rose (offby1) * Date: 2014-02-23 22:34
I've regenerated the patch with doc additions and Misc/ACKS changed as well.
msg212602 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-03-02 23:26
Uncle Timmy, what say you?
msg216289 - (view) Author: Chris Rose (offby1) * Date: 2014-04-15 13:38
Updated according to review.
msg216290 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-04-15 13:51
Hi. I added a couple of comments for your previous patch, the new one doesn't seem to have a review link.
msg216291 - (view) Author: Chris Rose (offby1) * Date: 2014-04-15 14:07
Addressed comments regarding documentation and assertion formats in the test.
msg216293 - (view) Author: Chris Rose (offby1) * Date: 2014-04-15 14:15
Patch against tip
msg216308 - (view) Author: Chris Rose (offby1) * Date: 2014-04-15 15:32
As a historical record, it should be noted that this is driven by an actual use case: I was experimenting with using Bazaar's patience diff implementation, and I saw that in order for them to use a custom sequence matcher, they had to essentially copy-paste and modify the stdlib diff methods in order to inject their own sequence matchers. That struck me as a bad thing, and that's pretty much what led to this.

I welcome a discussion of the API itself; there's definitely a bit of an odd challenge in describing the usage of the matcher variants when both are used (in line_matcher and char_matcher roles).

A possible approach would be to consider matcher factories to take _just_ a junk function, nothing else, and use the SequenceMatcher API's set_seqs method to actually provide the sequences in all cases. This fits the character use case, which reuses the matcher, and the line use case which does not.
msg217967 - (view) Author: Chris Rose (offby1) * Date: 2014-05-06 04:42
Ping? I'd like to know if the proposed solution passes muster, so that I can get this into ... well, whatever the right revision would be.
msg217982 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-05-06 13:44
Well, it would be 3.5, so there's plenty of time yet.  We are hoping for an answer from Tim Peters, but if we don't get one someone else will review it as time allows.  Pinging the issue like this was good; do it again in another month if there's no further action before then.  (NB: we're working on improving our workflow so that issues like this don't get lost, but it will take time for those improvements to show up...)
msg219657 - (view) Author: Chris Rose (offby1) * Date: 2014-06-03 03:24
As suggested: SYN
msg226625 - (view) Author: Chris Rose (offby1) * Date: 2014-09-09 06:38
What's the word on this change?
msg226657 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-09-09 20:09
Current status: Tim is currently not actively involved in Python development and no one who is active is enthusiastic about the proposal.

Your proposal is to add dependency injection to the initializer for the 4 diff classes by adding 6 new parameters. I will comment on the proposal itself below.  But first...

Your initial message in support of the proposal was about as weak as possible: "it would be possible to ...".  Lots of things are possible. For instance, we could add a Socket class replacement parameter to all stdlib functions and classes that use socket.Socket (and this has been requested in at least one case).  However, only a few changes actually get done.  Recognizing this ...

Your later message gave one use case: Bazaar's experiments with an alternate implementation.  To me, this example is still inadequate justification for a change. We agree that cut-and-paste is bad (though not terrible for one-off experiments).  That is why Python allows dynamic patching ('monkeypatching') of modules and Python-coded classes as an alternative (though subclassing is often better for classes).

import difflib
# either
class MyDiff: pass  # not really ;-)
# MyDiff might or might not subclass SequenceMatcher
difflib.SequenceMatcher = MyDiff
# or
def f(self, *args): pass  # again, not really
difflib.SequenceMatcher.somemethod = f
# either way, use difflib functions with alternate matcher.

I consider subclassing + dynamic patching an adequate solution* for the one use case you mention and likely other uses.  So I am inclined to reject the proposal as 'unnecessary'.  (If multiple people were routinely monkeypatching difflib like this, I might change my mind.)

* A context manager can be used to make the monkeypatching temporary and therefore local in effect. The following works.

@contextlib.contextmanager
def mp(ob, attr, new):
    old = getattr(ob, attr)
    setattr(ob, attr, new)
    yield
    setattr(ob, attr, old)


As for the proposal itself: you mention possible API alternatives.  I am more concerned about expanding our responsibility and corresponding test requirements.  If you monkeypatch the module, then you are responsible for the results.  If we add what amount to monkeypatching parameters, then we become responsible.  We would have to make sure that the requirements for replacements are adequately defined and design at least one non-trivial replacement and run all the appropriate tests with that replacement.  Your patch does not do this.  I would not want to do this. We would also have to be prepared to maintain the expanded definition of the classes.

Extending the mandate of the four classes from "work reasonably well with the builtin diff engine" to "work reasonably well with any 'similar' diff engine passed in by a user" is not as trivial as implied by the patch.

For a simple function like filter, the requirement for the function parameter is simple.  It must be a predicate with respect to the items yielded by the iterable arguments.  That means returning False or True and never raising when applied to each item.  It is easy to create alternatives such as lambda x: False, lambda x: True, lambda x: bool(x), and lambda x: x < 0, and test with appropriate iterables.

I hope this explains a bit why function parameters are routine in Python while class parameters are much less common.
History
Date User Action Args
2014-09-10 07:01:35Claudiu.Popasetnosy: - Claudiu.Popa
2014-09-09 20:09:12terry.reedysetmessages: + msg226657
2014-09-09 06:38:11offby1setmessages: + msg226625
2014-08-25 08:52:22Claudiu.Popasetstage: patch review
2014-06-03 03:24:16offby1setmessages: + msg219657
2014-05-06 13:44:37r.david.murraysetmessages: + msg217982
2014-05-06 04:42:33offby1setmessages: + msg217967
2014-04-16 11:04:58offby1setnosy: + r.david.murray
2014-04-15 15:32:01offby1setmessages: + msg216308
2014-04-15 14:15:08offby1setfiles: + difflib-sm.patch

messages: + msg216293
2014-04-15 14:07:55offby1setfiles: + difflib-sm.patch

messages: + msg216291
2014-04-15 13:51:36Claudiu.Popasetnosy: + Claudiu.Popa
messages: + msg216290
2014-04-15 13:38:25offby1setfiles: + difflib-sm.patch

messages: + msg216289
2014-03-02 23:26:46rhettingersetassignee: tim.peters

messages: + msg212602
nosy: + rhettinger, tim.peters
2014-03-02 10:55:21terry.reedysetnosy: + terry.reedy
2014-02-23 22:34:44offby1setfiles: + difflib-sm.patch

messages: + msg212037
2014-02-23 22:16:56offby1create