classification
Title: naive use of ''.join(difflib.unified_diff(...)) results in bogus diffs with inputs that don't end with end-of-line char (same with context_diff)
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.1, Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: trentm (1)
Priority: normal Keywords patch

Created on 2008-02-18 20:16 by trentm, last changed 2009-05-26 16:50 by trentm.

Files
File name Uploaded Description Edit Remove
python_difflib_unified_diff.patch trentm, 2008-02-18 20:24 patch against the Python 2.6 svn trunk to fix this
python_difflib_no_eol.patch trentm, 2009-05-26 16:50 patch (against trunk, currently 2.7) to fix and with test cases
Messages (4)
msg62543 - (view) Author: Trent Mick (trentm) Date: 2008-02-18 20:16
When comparing content with difflib, if the resulting diff covers the
last line of one or both of the inputs that that line doesn't end with
an end-of-line character(s), then the generated diff lines don't include
an EOL. Fair enough.

Naive (and I suspect typical) usage of difflib.unified_diff(...) is:

  diff = ''.join(difflib.unified_diff(...))

This results in an *incorrect* unified diff for the conditions described
above.

>>> from difflib import *
>>> gen = unified_diff("one\ntwo\nthree".splitlines(1),
...                    "one\ntwo\ntrois".splitlines(1))
>>> print ''.join(gen)
---
+++
@@ -1,3 +1,3 @@
 one
 two
-three+trois


The proper behaviour would be:

>>> gen = unified_diff("one\ntwo\nthree".splitlines(1),
...                    "one\ntwo\ntrois".splitlines(1))
>>> print ''.join(gen)
---
+++
@@ -1,3 +1,3 @@
 one
 two
-three
\ No newline at end of file
+trois
\ No newline at end of file


I *believe* that "\ No newline at end of file" are the appropriate
markers -- that tools like "patch" will know how to use. At least this
is what "svn diff" generates.


I'll try to whip up a patch. 

Do others concur that this should be fixed?
msg62544 - (view) Author: Trent Mick (trentm) Date: 2008-02-18 20:24
Attached is a patch against the Python 2.6 svn trunk for this.
msg62545 - (view) Author: Trent Mick (trentm) Date: 2008-02-18 20:25
At a glance I suspect this patch will work back to Python 2.3 (when
difflib.unified_diff() was added). I haven't looked at the Py3k tree yet.


Note: This *may* also applied to difflib.context_diff(), but I am not sure.
msg88375 - (view) Author: Trent Mick (trentm) Date: 2009-05-26 16:50
Here is a new patch that also fixes the same issue in
difflib.context_diff() and adds a couple test cases.
History
Date User Action Args
2009-05-26 16:50:14trentmsetfiles: + python_difflib_no_eol.patch

title: naive use of ''.join(difflib.unified_diff(...)) results in bogus diffs with inputs that don't end with end-of-line char -> naive use of ''.join(difflib.unified_diff(...)) results in bogus diffs with inputs that don't end with end-of-line char (same with context_diff)
messages: + msg88375
stage: test needed -> patch review
2009-05-12 14:09:32ajaksu2setstage: test needed
versions: + Python 3.1, - Python 2.5, Python 2.4, Python 2.3
2008-02-19 09:09:53christian.heimessetpriority: normal
keywords: + patch
2008-02-18 20:25:22trentmsetmessages: + msg62545
2008-02-18 20:24:08trentmsetfiles: + python_difflib_unified_diff.patch
messages: + msg62544
versions: + Python 2.6, Python 2.4, Python 2.3
2008-02-18 20:16:55trentmcreate