Issue 2986: difflib.SequenceMatcher not matching long sequences

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/47235

classification

Title:	difflib.SequenceMatcher not matching long sequences
Type:	behavior	Stage:	resolved
Components:	Documentation, Library (Lib)	Versions:	Python 3.1, Python 3.2, Python 2.7, Python 2.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:	difflib.SequenceMatcher: expose junk sets, deprecate undocumented isb... functions. View: 10534
Assigned To:	terry.reedy	Nosy List:	LambertDW, barry, eli.bendersky, georg.brandl, ggenellina, gjb1002, hagna, hodgestar, janpf, jcea, jimjjewett, mrotondo, pitrou, r.david.murray, rtvd, sjmachin, terry.reedy, tim.peters, vbr
Priority:	high	Keywords:	patch

Created on 2008-05-27 20:29 by hagna, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
difflib_test_inq.py	vbr, 2010-04-19 23:25	test file for difflib.SequenceMatcher comparing strings with minimal differences
issue2986.docs26.1.patch	eli.bendersky, 2010-07-24 04:59		review
issue2986.fix27.4.patch	eli.bendersky, 2010-09-03 04:45
issue2986.docs31.1.patch	eli.bendersky, 2010-11-08 05:10		review
issue2986.fix27.5.patch	eli.bendersky, 2010-11-11 08:24
issue2986.fix32.5.patch	hodgestar, 2010-11-20 15:18	Version of issue2986.fix27.5.patch that applies and passes tests in Python 3.2a.

Pull Requests
URL	Status	Linked	Edit
PR 17082	closed	python-dev, 2019-11-07 16:24

Messages (37)
msg67428 - (view)	Author: Nate (hagna)	Date: 2008-05-27 20:29
The following code shows no matches though the strings clearly match. from difflib import * a = '''3904320338155955662857322172779218727992471109386112515279452352973279311752006856588512503244702012502812653160306927721351031250270279878152125021081471125246894603319162986283456469448293252335442814953964029718671705515246437056879456095915444174665464026255415736754542680178373675412998898571410483714801783736754144828361714801783736754133068408714801783736754140859665714801783736754153851004471480178373675415715864371410690714801783736754147488890714801783736205957668017837367545448801783104170539154677705102536314736754477780178373675415217103227148017837367541737811137714801783736754172791151671480178373675417692995271480178373675417575983571480178373675417398965871480178310417055026467770551235573705687945609591544562532964082675415736300610425832914520311514810301595721999571547897879113780178373618951021983280377781981989237498913678981414213198924949892679989164882577810944751102884217048258978791137801783104170511836542073627327981801279360326159714801783736171798080178310415420736447510213871790638471586131412631592131012571210126718031314200414571314893700123874777987006697747115770067074789312578013869801783104120529166337056879456095918495136604565251349544838956219513495753741344870733943253617458316356794745831634651172458316348316144586052838244151360641656349118903581890331689038658903263218549028909605134957536316060''' b = '''4634320338155955662857322172779218727992471109386112515279452352973279311752006856588512503244702012502812653160306927721351031250270279878152125021081471125246894603319162986283456469448293252335442814953964029718671705515246437056879456095915444174665464026255415736754542680178373675412998898571410483714801783736754144828361714801783736754133068408714801783736754140859665714801783736754153851004471480178373675415715864371410690714801783736754147488890714801783736205957668017837367545448801783104170539154677705102536314736754477780178373675413182108117148017837367541737811137714801783736754172791151671480178373675417692995271480178373675417575983571480178373675417398965871480178310417055026467770551235573705687945609591544562532964082675415736300610425832914520311514810301595721999571547897879113780178373618951021983280377781981989237498913678981414213198924949892679989164882577810944751102884217048258978791137801783104170511836542073627327981801279360326159714801783736171798080178310415420736447510213871790638471412131420041457131485122165131466702097131466731723131466741536131466751581131466771649131466761975131467212090131467261974131467231858131467201556131467212538131467221553131467221943131467231748131466711452131467271787131412578013869801783104154307361718482280178373638585436251621338931320893185072980138084820801545115716861861152948618615002682261422349251058108327767521397977810837298017831041205291663370568794560959184951366045652513495448389562195134957537413448707339432536174583163''' lst = [(a,b)] for a, b in lst: print "---------------------------" s = SequenceMatcher(None, a, b) print "length of a is %d" % len(a) print "length of b is %d" % len(b) print s.find_longest_match(0, len(a), 0, len(b)) print s.ratio() for block in s.get_matching_blocks(): m = a[block[0]:block[0]+block[2]] print "a[%d] and b[%d] match for %d elements and it is \"%s\"" % (block[0], block[1], block[2], m)
msg84387 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2009-03-29 12:59
Tim, I think you've had some enlightening comments about difflib issues in the past.
msg84446 - (view)	Author: Mike Rotondo (mrotondo)	Date: 2009-03-30 00:40
From the source, it seems that there is undocumented behavior to SequenceMatcher which is causing this error. If b is longer than 200 characters, it will consider any element x in b that takes up more than 1% of it's contents as "popular", and thus junk. So, in this case, difflib is treating each individual digit as an element of your sequences, and each one takes up more than 1% of the complete sequence b. Therefore, each one is "popular", and therefore ignored. A snippet which demonstrates this: from difflib import SequenceMatcher for i in range(1, 202)[::10]: a = "a" * i b = "b" + "a" * i s = SequenceMatcher(None, a, b) print s.find_longest_match(0, len(a), 0, len(b)) Up til i=200, the strings match, but afterwards they do not because "a" is "popular". Strangely, if you get rid of the "b" at the beginning of b, they continue to match at lengths greater than 200. This may be a bug, I'll keep looking into it but someone who knows more should probably take a look too. The comments from difflib.py say some interesting things: # b2j also does not contain entries for "popular" elements, meaning # elements that account for more than 1% of the total elements, and # when the sequence is reasonably large (>= 200 elements); this can # be viewed as an adaptive notion of semi-junk, and yields an enormous # speedup when, e.g., comparing program files with hundreds of # instances of "return NULL;" This seems to mean that you won't actually get an accurate diff in certain cases, which seems odd. At the very least, this behavior should probably be documented. Do people think it should be changed to get rid of the "popularity" heuristic?
msg84449 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2009-03-30 01:33
On Mon, 30 Mar 2009 at 00:40, Mike Rotondo wrote: > This seems to mean that you won't actually get an accurate diff in > certain cases, which seems odd. At the very least, this behavior should > probably be documented. Do people think it should be changed to get rid > of the "popularity" heuristic? A better way, I think, would be to provide a way to turn it off (and then document it, of course).
msg93438 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-10-02 10:58
The popularity heuristic could be tuned to depend on the number N of distinct elements in the sequence, and kick in if an element appears say more than 1/(N**0.5) of the time.
msg103660 - (view)	Author: Vlastimil Brom (vbr)	Date: 2010-04-19 23:25
I just stumbled on some seemingly different unexpected behaviour of difflib.SequenceMatcher, but it turns out, it may have the same cause, i.e. the "popular" heuristics. I hopefully managed to replicate it on an illustrative sample text - in as included in the attached file. (I also mentioned this issue in hte python-list http://mail.python.org/pipermail/python-list/2010-April/1241951.html but as there were no replies I eventually found, this might be more appropriate place.) Both strings differ in a minimal way, each having one extra character in a "strategic" position, which probably meets some pathological case for difflib. Instead of just reporting the insertion and deletion of these single characters (which works well for most cases - with most other positions of the differing characters), the output of the SequenceMatcher decides to delete a large part of the string in between the differences and to insert the almost same text after that. The attached code simply prints the results of the comparison with the respective tags, and substrings. No junk function is used. I get the same results on Python 2.5.4, 2.6.5, 3.1.1 on windows XPp SP3. I didn't find any plausible mentions of such cases in the documentation, but after some searching I found several reports in the bug tracker mentioning the erroneous output of SequenceMatcher on longer repetitive sequences. besides this http://bugs.python.org/issue2986 e.g. http://bugs.python.org/issue1711800 http://bugs.python.org/issue4622 http://bugs.python.org/issue1528074 In my case, disabling the "popular" heuristics as mentioned by John Machin in http://bugs.python.org/issue1528074#msg29269 seems to have solved the problem; with a modified version of difflib containing: if 0: # disable popular heuristics if n >= 200 and len(indices) * 100 > n: populardict[elt] = 1 del indices[:] the comparison catches the differences in the test strings as expected - i.e. one character addition and deletion only. It is likely, that some other use cases for difflib may rely on the "popular"-heuristics but it also seems useful to have some control over this behaviour, which might not be appropriate in all cases. (The issue seems to be the same in python 2.5, 2.6 and 3.1.) regards, vbr
msg108636 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-06-25 21:56
This appears to be one of at least three duplicate issues: #1528074, #2986, and #4622. I am closing two, leaving 2986 open, and merging the nearly disjoint nosy lists. (If no longer interested, you can delete yourself from 2986.) #1711800 appears to be slightly different (if not, it could be closed also.) Whether or not a new feature is ever added (earliest, now, 3.2), it appears that the docs need improvement to at least explain the current behavior. If someone who understands the issue could open a separate doc issue (for 2.6/7/3.1/2) with a suggested addition, that would be great.
msg108856 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-06-28 19:35
The discussion on #152807 references two other closed tracker issues: #1678339 Test case that currently fails #1678345 Patch to change behavior - rejected because crippled behavior is supposedly intentional and removing the change would slow things down. The patch simply removes the internal heuristic. I think a better patch would be to make it optional, with a tunable popularity threshold. I say 'supposedly intentional' because the code comments only justify the popularity hack for code line comparison and give no indication of awareness that it disables SequenceMatcher for general purpose use, and in particular, for non-toy finite character set comparisons of the type (ascii) used in all the examples.
msg109090 - (view)	Author: Eli Bendersky (eli.bendersky) *	Date: 2010-07-02 07:16
The new "junk heuristic" has been added to difflib.py in SVN revision 26661 in 2002 (which is, incidentally, the last revision to modify difflib.py). Its commit log says: --------------------------------------------- Mostly in SequenceMatcher.{__chain_b, find_longest_match}: This now does a dynamic analysis of which elements are so frequently repeated as to constitute noise. The primary benefit is an enormous speedup in find_longest_match, as the innermost loop can have factors of 100s less potential matches to worry about, in cases where the sequences have many duplicate elements. In effect, this zooms in on sequences of non-ubiquitous elements now. While I like what I've seen of the effects so far, I still consider this experimental. Please give it a try! ---------------------------------------------
msg109442 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-07-06 23:18
[Also posted to pydev for additional input, with Subject line Issue 2986: difflib.SequenceMatcher is partly broken Developed with input from Eli Bendersky, who will write patchfile(s) for whichever change option is chosen.] Summary: difflib.SeqeunceMatcher was developed, documented, and originally operated as "a flexible class for comparing pairs of sequences of any [hashable] type". An "experimental" heuristic was added in 2.3a1 to speed up its application to sequences of code lines, which are selected from an unbounded set of possibilities. As explained below, this heuristic partly to completely disables SequenceMatcher for realistic-length sequences from a small finite alphabet. The regression is easy to fix. The docs were never changed to reflect the effect of the heuristic, but should be, with whatever additional change is made. In the commit message for revision 26661, which added the heuristic, Tim Peters wrote "While I like what I've seen of the effects so far, I still consider this experimental. Please give it a try!" Several people who have tried it discovered the problem with small alphabets and posted to the tracker. Issues #1528074, #1678339. #1678345, and #4622 are now-closed duplicates of #2986. The heuristic needs revision. Open questions (discussed after the examples): what exactly to do, which versions to do it too, and who will do it. --- Some minimal difference examples: from difflib import SequenceMatcher as SM # base example print(SM(None, 'x' + 'y'199, 'y'199).ratio()) # should be and is 0.9975 (rounded) # make 'y' junk print(SM(lambda c:c=='y', 'x' + 'y'199, 'y'199).ratio()) # should be and is 0.0 # Increment b by 1 char print(SM(None, 'x' + 'y'199, 'y'200).ratio()) # should be .995, but now is 0.0 because y is treated as junk # Reverse a and b, which increments b print(SM(None, 'y'199, 'x' + 'y'199).ratio()) # should be .9975, as before, but now is 0.0 because y is junked The reason for the bug is the heuristic: if the second sequence is at least 200 items long then any item occurring more than one percent of the time in the second sequence is treated as junk. This was aimed at recurring code lines like 'else:' and 'return', but can be fatal for small alphabets where common items are necessary content. A more realistic example than the above is comparing DNA gene sequences. Without the heuristic SequenceMatcher.get_opcodes() reports an appropriate sequence of matches and edits and .ratio works as documented and expected. For 1000/2000/6000 bases, the times on a old Athlon 2800 machine are <1/2/12 seconds. Since 6000 is longer than most genes, this is a realistic and practical use. With the heuristic, everything is junk and there is only one match, ''=='' augmented by the initial prefix of matching bases. This is followed by one edit: replace the rest of the first sequence with the rest of the second sequence. A much faster way to find the first mismatch would be i = 0 while first[i] == second[i]: i+=1 The match ratio, based on the initial matching prefix only, is spuriously low. --- Questions: 1: what change should be make. Proposed fix: Disentangle the heuristic from the calculation of the internal b2j dict that maps items to indexes in the second sequence b. Only apply the heuristic (or not) afterward. Version A: Modify the heuristic to only eliminate common items when there are more than, say, 100 items (when len(b2j)> 100 where b2j is first calculated without popularity deletions). The would leave DNA, protein, and printable ascii+[\n\r\t] sequences alone. On the other hand, realistic sequences of more than 200 code lines should have at least 100 different lines, and so the heuristic should continue to be applied when it (mostly?) 'should' be. This change leaves the API unchanged and does not require a user decision. Version B: add a parameter to .__init__ to make the heuristic optional. If the default were True ('use it'), then the code would run the same as now (even when bad). With the heuristic turned off, users would be able to get the .ratio they may expect and need. On the other hand, users would have to understand the heuristic to know when and when not to use it. Version C: A more radical alternative would be to make one or more of the tuning parameters user settable, with one setting turning it off. 2. What type of issue is this, and what version get changed. I see the proposal as partial reversion of a change that sometimes causes a regression, in order to fix the regression. Such would usually be called a bugfix. Other tracker reviewers claim this issue is a feature request, not a bugfix. Either way, 3.2 gets the fix. The practical issue is whether at least 2.7(.1) should get the fix, or whether the bug should forever continue in 2.x. 3. Who will make the change. Eli will write a patch and I will check it. However, Georg Brandel assigned the issue to Tim Peters, with a request for comment, but Tim never responded. Is there an active committer who will grab the issue and do a commit review when a patch is ready?
msg109507 - (view)	Author: Vlastimil Brom (vbr)	Date: 2010-07-07 23:17
I guess, I am not supposed to post to python-dev - not being a python developer, hopefully it is appropriate to add a comment here - only based on my current usage of (a modified) difflib.SequenceMatcher. It seems, the mentions of text comparison in that thread, e.g. http://mail.python.org/pipermail/python-dev/2010-July/101515.html etc. rather imply line-by-line comparison, and possibly character comparison of matched lines. For me the direct character-wise comparison is more useful in most cases. With the popular heuristics disabled the results look pretty well. (the script only involves changing the background colour of the compared texts - based on the SequenceMatcher - get_opcodes() ) Just now, I only need to disable the popular check, currently I use a monkey-patched subclass of SequenceMatcher with extended signature and modified __chain_b function. cf. http://mail.python.org/pipermail/python-list/2010-June/1247907.html I would vote for extending the SequenceMatcher API to enable adjustments (leaving the default values as the current ones) - enable/disable popular check, set the thresholds for string length and "popular" frequency (and eventually other parameters, which might be added). Are there some restrictions on API changes in a library due to a moratorium - even if the default behaviour remains unchanged? Otherwise, what might be the disadvantages of this approach? If the current behaviour is considered appropriate for the original usecases, other uses would be also made possible/easier - only at the cost of learning the meaning of the added parameters - from the enhanced docs, of course. vbr
msg109636 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-07-08 22:47
Anyone can post on Python-dev, but non-developers should do so judiciously and with respect for the purpose of the list. It is also polite to introduce oneself with the first post. In any case, Tim Peters has approved making some change. The remaining question is exactly what. There is no problem with extending the API in 3.2. The debate there is over 2.7. My fourth proposal, detailed on pydev, is to introduce a fourth paramater, 'common', to set the frequency threshold to None or int 1-99.
msg109639 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-07-08 22:52
> There is no problem with extending the API in 3.2. The debate there is > over 2.7. We could extend the API as long as it stays backwards-compatible (that is, the default value for the new argument produces the same behaviour as before).
msg109654 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-07-09 01:12
My proposal F, to expose the common frequency threshold as a fourth positional parameter with default 1, would do that: repeat current behavior. We should, and Eli and I would, add some of the anomalous cases to the test suite and verily that the default is to reproduce the current anomalies, and that passing None changes the result. Any opinions, anyone, on 'common', 'thresh', 'threshold', or anything else as the new parameter name? We will have to explain in the doc patch that the parameter is new in 2.7.1 to fix a partial bug and that giving any explicit value will make code not run with 2.7 (.0). Exposing the set of common values as an instance attribute, as I proposed on pydev, would be a new feature not needed to fix the bug. So it should be limited to 3.2.
msg110251 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-07-14 01:45
[copied from pydev post] Summary: adding an autojunk heuristic to difflib without also adding a way to turn it off was a bug because it disabled running code. 2.6 and 3.1 each have, most likely, one final version each. Don't fix for these but add something to the docs explaining the problem and future fix. 2.7 will have several more versions over several years and will be used by newcomers who might encounter the problem but not know to diagnose it and patch a private copy of the module. So it should have a fix. Solutions thought of so far. 1. Modify the heuristic to somewhat fix the problem. Bad (unacceptable) because this would silently change behavior and could break tests. 2. Add a parameter that defaults to using the heuristic but allows turning it off. Perhaps better, but code that used the new API would crash if run on 2.7.0 3. Tim Peters > Think the most pressing thing is to give people a way to turn the damn > thing off. An ugly way would be to trigger on an unlikely > input-output behavior of the existing isjunk argument. For example, > if > > isjunk("what's the airspeed velocity of an unladen swallow?") > > returned > > "don't use auto junk!" > > and 2.7.1 recognized that as meaning "don't use auto junk", code could > be written under 2.7.1 that didn't blow up under 2.7. It could > _behave_ differently, although that's true of any way of disabling the > auto-junk heuristics. Ugly, but perhaps crazy brilliant. Use of such a hack would obviously be temporary. Perhaps its use could be made to issue a -3 warning if such were enabled. I would simplify the suggestion to something like isjunk("disable!heuristic") == True so one could pass lambda s:s=="disable!heuristic" It should be something easy to document and write. This issue is the only place such a string should appear, so it should be safe. Tim and Antoine: if you two can agree on what to do for 2.7, Eli and I will code it. This suggestion amounts to a suggestion that the fix for 2.7 be decoupled from a better fix for 3.2. I agree. The latter can be discussed once 2.7 is settled.
msg110261 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-07-14 09:06
Le mercredi 14 juillet 2010 à 01:45 +0000, Terry J. Reedy a écrit : > > 2. Add a parameter that defaults to using the heuristic but allows > turning it off. Perhaps better, but code that used the new API would > crash if run on 2.7.0 Yes, but this is an exceptional situation. We normally don't add new APIs in bugfix versions. We'll have to live with it. > 3. > [...] > Ugly, but perhaps crazy brilliant. Use of such a hack would obviously > be temporary. Perhaps its use could be made to issue a -3 warning if > such were enabled. It's still incredibly ugly. Besides, code written for 2.7.1 might not "blow up" with 2.7, but it will still have different behaviour. If you are using the new parameter, it's because you need it, hence different behaviour will be unacceptable; therefore, better to raise an error as the API change proposal does.
msg111372 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-07-23 18:31
For 2.6 and 3.1, this is a documentation only issue. For 2.7, this is a doc + behavior issue. For 3.2, this is a doc + behavior + new feature issue. For 2.6.6 (release candidate due Aug 2, 10 days), I propose to add the following paragraph after the current 'Timing:' paragraph in the SequenceMatcher entry ('Heuristic:' should be bold-faced, like 'Timing:') Heuristic: To speed matching, items that appear more than 1% of the time in sequences of at least 200 items are treated as junk. This has the unfortunate side-effect of giving bad results for sequences constructed from a small set of items. An option to turn off the heuristic will be added to a future version. I would have said 'to 2.7.1' but that has not happened yet. I thought about putting the heuristic paragraph first, but I think it fits better after the discussion of quadratic run time. I think it should be a separate paragraph and not tacked on the end of the previous paragraph so people will be more likely to take notice. I have marked this a release blocker because at least 6 issues have been filed for this bug and so I think it important that the explanation be added to the next released doc. I plan to temporarily reassign this to docs@python in a few days.
msg111425 - (view)	Author: Eli Bendersky (eli.bendersky) *	Date: 2010-07-24 04:59
Here's a patch for Doc/library/difflib.rst of the 2.6 branch, following Terry's suggested addition to the docs of the SequenceMatcher class. Tested 'make html'.
msg112116 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2010-07-31 07:06
Deferring to after 3.2a1.
msg112120 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2010-07-31 08:00
Committed 2.6 patch in r83314.
msg112490 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2010-08-02 16:17
Georg committed this patch to the 2.6 tree, and besides, this is doesn't seem like a blocking issue, so I'm kicking 2.6 off the list and knocking the priority down.
msg115335 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-09-01 21:32
While refactoring the code for 2.7, I discovered that the description of the heuristic for 2.6 and in the code comments is off by 1. "items that appear more than 1% of the time" should actually be "items whose duplicates (after the first) appear more than 1% of the time". The discrepancy arises because in the following code for i, elt in enumerate(b): if elt in b2j: indices = b2j[elt] if n >= 200 and len(indices) * 100 > n: populardict[elt] = 1 del indices[:] else: indices.append(i) else: b2j[elt] = [i] len(indices) is retrieved before the index i of the current elt is added. Whatever one might think the heuristic 'should' have been (and by the nature of heuristics, there is no right answer), the default behavior must remain as it is, so we adjusted the code and doc to match that.
msg115419 - (view)	Author: Eli Bendersky (eli.bendersky) *	Date: 2010-09-03 04:45
Attaching a patch (developed jointly with Terry Reedy) for 2.7 that adds an 'autojunk' parameter to SequenceMatcher's constructor. The parameter is True by default which retains the current behavior in 2.6 and earlier, but can be set by the user to False to disable the popularity heuristic. The patch also fixes some documentation inconsistencies that Terry raised in this message. Notes: 1. Tests run successfully. Added new test class in test_difflib for testing with the new autojunk parameter False 2. Patch generated vs. Hg mirror
msg115787 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-09-07 16:02
The patch changes the internal function that constructs the dict mapping b items to indexes to read as follows: create b2j mapping if isjunk function, move junk items to junk set if autojunk, move popular items to popular set I helped write and test the 2.7 patch and verify that default behavior remains unchanged. I believe it is ready to commit. 3.1 and 3.2 patches will follow.
msg120713 - (view)	Author: Eli Bendersky (eli.bendersky) *	Date: 2010-11-08 05:10
Adding a documentation patch for 3.1 which is similar to the 2.6 documentation patch that's been committed by Georg into 2.6
msg120927 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-11-10 18:20
Tim told me to continue with this as he has no time. rev86401 - apply 3.1 doc fix I cannot apply 2.7 patch. I has different header lines. In particular, TortoiseSVN cannot fetch nonexistent revision "Mon Aug 30 06:37:52 2010 +0300". Please regenerate against current 2.7 with method used for 2.6/3.1.
msg120939 - (view)	Author: Eli Bendersky (eli.bendersky) *	Date: 2010-11-11 08:24
Attaching a new patch for 2.7 freshly generated vs. current 2.7 maintenance branch from SVN.
msg120992 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-11-12 00:22
issue2986.fix27.5.patch applied, with version note added to doc, as rev86418 Only thing left is patch for 3.2, which Eli and I will produce.
msg121079 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-11-12 21:10
r86437 - correct and replicate version-added message
msg121596 - (view)	Author: Eli Bendersky (eli.bendersky) *	Date: 2010-11-20 05:45
Terry, when is the deadline for producing the patch for 3.2? Perhaps we should at least submit the 2.7 patch for now so that it goes in for sure?
msg121662 - (view)	Author: Simon Cross (hodgestar)	Date: 2010-11-20 15:18
I made the minor changes needed to get Eli Bendersky's patch to apply against 3.2. Diff attached.
msg121697 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-11-20 17:55
Deadline is probably next Fri. However I will apply this or slight revision thereof in a couple of days to make sure this much is in. I have to fixup some work stuff today.
msg121902 - (view)	Author: Eli Bendersky (eli.bendersky) *	Date: 2010-11-21 11:04
Simon's patch fix for 3.2 looks good to me - applies cleanly to py3k and tests pass.
msg122335 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-11-25 06:27
Since I am not sure I will be able to do any more before the 3.2b1 feature freeze, I went ahead with the minimal patch after checking the differences from the 2.7 version and redoing the Misc/News entry. (I suspect putting a new entry immediately after the appropriate heading, instead of between other headings, is probably least likely to fatally conflict with intervening changes.) r86745 Thank you Eli and Simon. Leaving this open for possible further changes.
msg122337 - (view)	Author: Simon Cross (hodgestar)	Date: 2010-11-25 06:48
My vote is that this bug be closed and a new feature request be opened. Failing that, it would be good to have a concise description of what else we would like done (and the priority should be downgraded, I guess).
msg122338 - (view)	Author: Eli Bendersky (eli.bendersky) *	Date: 2010-11-25 06:59
Terry, I agree with Simon re closing and opening a new feature request. This issue has too much baggage in it, and you we always link to it. A new feature request should be opened strictly for 3.2 If you want I can close this issue and open a new one, but I'm waiting for your approval.
msg122401 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-11-25 20:23
Agreed. #10534. This is really a 'follow-on' rather than 'superseder', but the forward reference should be easy for anyone to find.

History
Date	User	Action	Args
2022-04-11 14:56:35	admin	set	github: 47235
2019-11-07 16:24:52	python-dev	set	pull_requests: + pull_request16592
2011-01-09 03:22:36	terry.reedy	set	nosy: tim.peters, barry, georg.brandl, terry.reedy, jcea, jimjjewett, sjmachin, gjb1002, ggenellina, pitrou, rtvd, vbr, LambertDW, hodgestar, hagna, r.david.murray, eli.bendersky, janpf, mrotondo stage: needs patch -> resolved
2011-01-09 02:31:29	jcea	set	nosy: + jcea
2010-11-25 20:23:36	terry.reedy	set	status: open -> closed versions: + Python 2.6, Python 3.1, Python 2.7 resolution: fixed messages: + msg122401 superseder: difflib.SequenceMatcher: expose junk sets, deprecate undocumented isb... functions. type: enhancement -> behavior
2010-11-25 06:59:23	eli.bendersky	set	messages: + msg122338
2010-11-25 06:48:02	hodgestar	set	messages: + msg122337
2010-11-25 06:27:28	terry.reedy	set	type: behavior -> enhancement messages: + msg122335
2010-11-21 11:04:43	eli.bendersky	set	messages: + msg121902
2010-11-20 17:55:13	terry.reedy	set	messages: + msg121697
2010-11-20 15:18:19	hodgestar	set	files: + issue2986.fix32.5.patch nosy: + hodgestar messages: + msg121662
2010-11-20 05:45:30	eli.bendersky	set	messages: + msg121596
2010-11-12 21:10:57	terry.reedy	set	messages: + msg121079
2010-11-12 00:22:41	terry.reedy	set	stage: commit review -> needs patch messages: + msg120992 versions: - Python 2.7
2010-11-11 08:24:06	eli.bendersky	set	files: + issue2986.fix27.5.patch messages: + msg120939
2010-11-10 19:55:37	terry.reedy	set	versions: - Python 3.1
2010-11-10 19:54:59	terry.reedy	set	messages: - msg120925
2010-11-10 18:20:56	terry.reedy	set	messages: + msg120927
2010-11-10 18:13:04	terry.reedy	set	assignee: tim.peters -> terry.reedy messages: + msg120925
2010-11-08 05:10:24	eli.bendersky	set	files: + issue2986.docs31.1.patch messages: + msg120713
2010-09-07 16:02:28	terry.reedy	set	messages: + msg115787 stage: test needed -> commit review
2010-09-03 04:46:06	eli.bendersky	set	files: + issue2986.fix27.4.patch messages: + msg115419
2010-09-01 21:32:58	terry.reedy	set	messages: + msg115335
2010-08-02 16:17:13	barry	set	priority: release blocker -> high messages: + msg112490 versions: - Python 2.6
2010-07-31 18:24:27	georg.brandl	set	priority: deferred blocker -> release blocker
2010-07-31 08:00:47	georg.brandl	set	messages: + msg112120
2010-07-31 07:06:02	georg.brandl	set	priority: release blocker -> deferred blocker messages: + msg112116
2010-07-24 04:59:19	eli.bendersky	set	files: + issue2986.docs26.1.patch keywords: + patch messages: + msg111425
2010-07-23 18:31:32	terry.reedy	set	priority: normal -> release blocker versions: + Python 2.6, Python 3.1, Python 2.7 nosy: + barry messages: + msg111372 type: enhancement -> behavior
2010-07-14 09:06:44	pitrou	set	messages: + msg110261
2010-07-14 01:45:22	terry.reedy	set	messages: + msg110251
2010-07-09 01:12:03	terry.reedy	set	messages: + msg109654
2010-07-08 22:52:45	pitrou	set	messages: + msg109639
2010-07-08 22:47:55	terry.reedy	set	messages: + msg109636
2010-07-08 02:18:21	terry.reedy	set	messages: - msg109450
2010-07-08 02:18:04	terry.reedy	set	messages: - msg109449
2010-07-07 23:17:01	vbr	set	messages: + msg109507
2010-07-07 02:46:20	eli.bendersky	set	messages: + msg109450
2010-07-07 02:43:07	eli.bendersky	set	files: - unnamed
2010-07-07 02:40:11	eli.bendersky	set	files: + unnamed messages: + msg109449
2010-07-06 23:18:23	terry.reedy	set	messages: + msg109442
2010-07-02 07:16:09	eli.bendersky	set	messages: + msg109090
2010-06-28 19:35:37	terry.reedy	set	messages: + msg108856
2010-06-25 21:56:19	terry.reedy	set	nosy: + LambertDW, jimjjewett, terry.reedy, rtvd, janpf, ggenellina, sjmachin, eli.bendersky messages: + msg108636 versions: - Python 2.7
2010-06-25 21:55:55	terry.reedy	link	issue4622 superseder
2010-06-25 21:55:07	terry.reedy	link	issue1528074 superseder
2010-04-19 23:25:07	vbr	set	files: + difflib_test_inq.py nosy: + vbr messages: + msg103660
2009-10-02 10:58:47	pitrou	set	nosy: + pitrou messages: + msg93438
2009-10-01 12:24:49	gjb1002	set	nosy: + gjb1002
2009-05-28 01:09:05	r.david.murray	set	versions: + Python 3.2, - Python 2.5 nosy: tim.peters, georg.brandl, hagna, r.david.murray, mrotondo priority: normal components: + Documentation, Library (Lib), - Extension Modules type: enhancement stage: test needed
2009-03-30 01:33:30	r.david.murray	set	nosy: + r.david.murray messages: + msg84449
2009-03-30 00:40:16	mrotondo	set	nosy: + mrotondo messages: + msg84446 versions: + Python 2.7
2009-03-29 12:59:19	georg.brandl	set	assignee: tim.peters messages: + msg84387 nosy: + georg.brandl, tim.peters
2008-05-27 20:29:56	hagna	create