This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: difflib.unified_diff loses context
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.3, Python 3.4, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Brice.Videau, folder4ben, mal, steve.newcomb, terry.reedy
Priority: normal Keywords:

Created on 2011-03-22 10:09 by Brice.Videau, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
diff.tgz Brice.Videau, 2011-03-22 10:09
test.zip mal, 2013-04-29 07:25
450.zip folder4ben, 2013-09-20 09:01
Messages (5)
msg131735 - (view) Author: Brice Videau (Brice.Videau) Date: 2011-03-22 10:09
unified_diff seems to lose the context when comparing the 2 files contained in the attached archive using this script :

import difflib
b1 = open("out1.short","r").read().splitlines(True)
b2 = open("out2.short","r").read().splitlines(True)
compare = difflib.unified_diff(b1,b2)
for line in compare:
    print line,

a big chunk of lines is considered as removed, just to be added next (around line 16).

Comparing out2.short against out1.short does not produce this behavior :
compare = difflib.unified_diff(b2,b1)
is "correct".

Other diff tools such as diff or vimdiff do not exhibit this problem.
msg132368 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-03-27 21:56
2.6 only gets security fixes now.
msg188043 - (view) Author: Miguel Latorre (mal) Date: 2013-04-29 07:25
This bug is still present in python 2.7.4 and python 3.3.1.
I attach another example, the result differs depending on number of lines to process (see test.py).
msg198134 - (view) Author: Benoît D Vages (folder4ben) Date: 2013-09-20 09:01
An other exemple if necessary (python 2.6 / 2.7)
Got same behavior than mal using his script and my files.

Seems to occur when the chunk of lines between 2 differences is repeated many times in the file
msg278892 - (view) Author: Steve Newcomb (steve.newcomb) * Date: 2016-10-18 15:23
Context reporting is still buggy in Python 3.5.2:

>>> [ x for x in difflib.unified_diff( "'on'\n", "'on'\n\n\n")]
['--- \n', '+++ \n', '@@ -3,3 +3,5 @@\n', ' n', " '", ' \n', '+\n', '+\n']
>>> import sys
>>> sys.version
'3.5.2 (default, Sep 10 2016, 08:21:44) \n[GCC 5.4.0 20160609]'
>>> 
(compiled under Ubuntu 16.04.1 LTS)
History
Date User Action Args
2022-04-11 14:57:15adminsetgithub: 55841
2016-10-18 15:23:47steve.newcombsetnosy: + steve.newcomb
messages: + msg278892
2013-09-20 09:01:11folder4bensetfiles: + 450.zip
nosy: + folder4ben
messages: + msg198134

2013-04-29 21:30:24terry.reedysetversions: + Python 3.4, - Python 3.1
2013-04-29 07:25:07malsetfiles: + test.zip
versions: + Python 2.7, Python 3.3
nosy: + mal

messages: + msg188043
2011-03-27 21:56:15terry.reedysetnosy: + terry.reedy

messages: + msg132368
versions: - Python 2.6
2011-03-22 10:09:29Brice.Videaucreate