Author chipx86
Recipients
Date 2007-05-03.10:24:47
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
difflib.SequenceMatcher fails to distinguish between a "replace" block and an "insert" or "delete" block when the "insert/delete" immediately follows a "replace". It will lump both changes together as one big "replace" block.

This happens due to how get_opcodes() works. get_opcodes() loops through the matching blocks, grouping them into tags and ranges. However, if a block of text is changed and then new text is immediately added, it can't see this. All it knows is that the next matching block is after the added text.

As an example, consider these strings:

"ABC"

"ABCD
EFG."

Any diffing program will show that the first line was replaced and the second was inserted. SequenceMatcher, however, just shows that there was one replace, and includes both lines in the range.

I've attached a testcase that reproduces this for both replace>insert and replace>delete blocks.
History
Date User Action Args
2007-08-23 14:53:33adminlinkissue1711800 messages
2007-08-23 14:53:33admincreate