classification
Title: difflib.unified_diff(...) produces invalid patches
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: trentm Nosy List: abacabadabacaba, benjamin.peterson, djc, eric.araujo, georg.brandl, mark.dickinson, meatballhat, puxlit, techtonik, thatiparthy, tim.peters, trentm, ysj.ray
Priority: normal Keywords: patch

Created on 2008-02-18 20:16 by trentm, last changed 2018-10-23 11:55 by thatiparthy.

Files
File name Uploaded Description Edit
python_difflib_unified_diff.patch trentm, 2008-02-18 20:24 patch against the Python 2.6 svn trunk to fix this
python_difflib_no_eol.patch trentm, 2009-05-26 16:50 patch (against trunk, currently 2.7) to fix and with test cases
issue_2142_trunk.diff ysj.ray, 2010-05-14 08:54 patch against the trunk.
issue_2142_py3k.diff ysj.ray, 2010-05-14 08:54 patch against py3k
issue_2142_py3k_updated.diff ysj.ray, 2010-07-20 04:22 updated patch against py3k
issue_2142.diff ysj.ray, 2011-03-10 07:03
issue_2142_2.diff ysj.ray, 2011-03-15 05:55 review
Messages (22)
msg62543 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2008-02-18 20:16
When comparing content with difflib, if the resulting diff covers the
last line of one or both of the inputs that that line doesn't end with
an end-of-line character(s), then the generated diff lines don't include
an EOL. Fair enough.

Naive (and I suspect typical) usage of difflib.unified_diff(...) is:

  diff = ''.join(difflib.unified_diff(...))

This results in an *incorrect* unified diff for the conditions described
above.

>>> from difflib import *
>>> gen = unified_diff("one\ntwo\nthree".splitlines(1),
...                    "one\ntwo\ntrois".splitlines(1))
>>> print ''.join(gen)
---
+++
@@ -1,3 +1,3 @@
 one
 two
-three+trois


The proper behaviour would be:

>>> gen = unified_diff("one\ntwo\nthree".splitlines(1),
...                    "one\ntwo\ntrois".splitlines(1))
>>> print ''.join(gen)
---
+++
@@ -1,3 +1,3 @@
 one
 two
-three
\ No newline at end of file
+trois
\ No newline at end of file


I *believe* that "\ No newline at end of file" are the appropriate
markers -- that tools like "patch" will know how to use. At least this
is what "svn diff" generates.


I'll try to whip up a patch. 

Do others concur that this should be fixed?
msg62544 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2008-02-18 20:24
Attached is a patch against the Python 2.6 svn trunk for this.
msg62545 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2008-02-18 20:25
At a glance I suspect this patch will work back to Python 2.3 (when
difflib.unified_diff() was added). I haven't looked at the Py3k tree yet.


Note: This *may* also applied to difflib.context_diff(), but I am not sure.
msg88375 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2009-05-26 16:50
Here is a new patch that also fixes the same issue in
difflib.context_diff() and adds a couple test cases.
msg105636 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-13 14:55
Closed 8702 as a duplicate of this one.  Combining nosy lists.
msg105638 - (view) Author: anatoly techtonik (techtonik) Date: 2010-05-13 16:01
You mean that this patch sits in here for 2 years already? This suxx.

And nobody added the tag 'easy', because people haven't explicitly requested tracker privileges? That suxx. 

And it won't be in 2.7 because of that.. I am disappointed.

Why this issue is not labeled as 'difflib' related? How are people supposed to find this one?
msg105639 - (view) Author: anatoly techtonik (techtonik) Date: 2010-05-13 16:03
Trent, the "\ New line..." seems to be feature specific to unified format only. http://en.wikipedia.org/wiki/Diff#Unified_format
msg105647 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-13 17:20
Hmm.  Not in my version of diff:  it seems to use '\ No newline at end of file' for both unified and 'normal' diffs.

$ diff --version
diff (GNU diffutils) 2.8.1
Copyright (C) 2002 Free Software Foundation, Inc.
...

$ diff 1.txt 2.txt
2c2
< two
\ No newline at end of file
---
> two
msg105692 - (view) Author: ysj.ray (ysj.ray) Date: 2010-05-14 08:54
This is really a bug, but why it's not fixed during such a long time?

Since trentm's python_difflib_no_eol.patch patch failed against the current trunk, I modified it to work again, also a patch against py3k.
msg105693 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-14 09:13
I think it's arguable whether this is a bug or not.  There's no official specification for the unified diff format that I can find anywhere;  the GNU description at

http://www.gnu.org/software/hello/manual/diff/Detailed-Unified.html#Detailed-Unified

doesn't mention this detail.  The '\ No newline at end of file' is actually unnecessary for these Python functions, since they operate on lists and produce a generator, so it would be needless complication.  And changing this might break existing Python code that manually parses the output of unified_diff or context_diff and doesn't know what to do with a leading '\' character.  (Does such Python code exist?  I don't know.)

I'd suggest adding a keyword argument to the unified_diff and context_diff  functions to enable this feature, leaving it disabled by default.
msg105762 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-14 21:08
Marking as a feature request:  I think this is somewhere on the line between bugfix and new feature, but either way 2.7 is too close to release to start messing with something that isn't really all that broken.

For 2.7 (and 2.6, 3.1), there's also the option of updating the documentation to point out this issue.  It would also be worth updating the command-line-interface example in the docs to do this post-processing (i.e., for each line that doesn't end in a newline, add '\n\ No newline at end of file') before calling writelines.
msg110778 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-19 16:43
The two latest patch files were missing TestDifflib from run_unittest.  Having shown that the tests failed if test_difflib only was patched, then patch difflib against the 3.1 maintainance and py3k branches, the former passed ok and the latter failed.  Could someone please take a look, at the same time considering Mark D's suggestion about using a keyword argument to the unified_diff and context_diff functions.
msg110864 - (view) Author: ysj.ray (ysj.ray) Date: 2010-07-20 04:22
Firstly, since this is marked as feature request, I updated the patch against py3k, to fix the problems found by BreamoreBoy.
msg110866 - (view) Author: ysj.ray (ysj.ray) Date: 2010-07-20 05:10
I feel adding a keyword argument to deal with such a case is not quiet worthy. I'd prefer not fix it for 2.x, and make its behavior consist with svn diff and other similar tools, as trentm said, "tools like "patch" will know how to use."
msg118125 - (view) Author: Trent Mick (trentm) * (Python committer) Date: 2010-10-07 17:43
c.f. some discussion on python-dev here:
http://mail.python.org/pipermail/python-dev/2010-October/104501.html
msg123163 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-12-03 02:42
To sum up the current consensus as I understand it, the missing end of line is clearly a bug, but the addition of “\No newline at end of file” is controversial.  In the current patch, both are changed, so backporting only the first fix to 3.1 and 2.7 will require either agreeing to add “\No newline etc” in those versions too, or make another patch for 3.1/2.7.

I uploaded the patch to http://codereview.appspot.com/3432041/ and made some minor comments.
msg130491 - (view) Author: ysj.ray (ysj.ray) Date: 2011-03-10 07:03
Updated patch which can apply to current py3k cleanly and with changes follow eric's review comments.
msg130677 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-03-12 14:49
Patch does not look bad.  I re-read the discussion on python-dev, where it was suggested to add a keyword argument to get the old behavior.  Have you considered it?
msg130683 - (view) Author: ysj.ray (ysj.ray) Date: 2011-03-12 15:28
> I re-read the discussion on python-dev, where it was suggested to add a keyword argument to get the old behavior.  Have you considered it?


IIUC, at the time of that discusstion, 3.2 is pre-beta so the suitable option is to add "\No newline etc" by default and add a keyword to get the old behavior, but now it is for 3.3. I wonder if 3.3 is a chance to totally get rid of the *OLD WRONG* behavior.
msg130801 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-03-14 10:51
I’m not sure whether the compatibility policy would prevent that.  Aren’t there some cases with iterators (not strings) that work fine today without the change?  If so, it should be possible to get the same behavior.
msg130959 - (view) Author: ysj.ray (ysj.ray) Date: 2011-03-15 05:55
Yes there may be.

Here is the updated patch:

Add a new keyword argument to context_diff() and unified_diff() named "warn_no_newline_at_end". If true, emit "\ No newline etc". It defaults to True, set it to false to get the old behavior. I'm not sure if this name is too long, but I haven't got a better name.
msg236541 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-02-24 20:26
Can we have a review on the updated patch please.
History
Date User Action Args
2018-10-23 11:55:15thatiparthysetnosy: + thatiparthy
2018-10-11 17:57:24BreamoreBoysetnosy: - BreamoreBoy
2018-10-11 13:04:53puxlitsetnosy: + puxlit
2015-11-04 08:54:37abacabadabacabasetnosy: + abacabadabacaba
2015-02-24 20:26:42BreamoreBoysetnosy: + BreamoreBoy

messages: + msg236541
versions: + Python 3.5, - Python 3.3
2011-03-15 05:55:06ysj.raysetfiles: + issue_2142_2.diff
nosy: tim.peters, georg.brandl, mark.dickinson, techtonik, benjamin.peterson, trentm, djc, eric.araujo, meatballhat, ysj.ray
messages: + msg130959
2011-03-14 10:51:08eric.araujosetnosy: tim.peters, georg.brandl, mark.dickinson, techtonik, benjamin.peterson, trentm, djc, eric.araujo, meatballhat, ysj.ray
messages: + msg130801
2011-03-12 15:28:19ysj.raysetnosy: tim.peters, georg.brandl, mark.dickinson, techtonik, benjamin.peterson, trentm, djc, eric.araujo, meatballhat, ysj.ray
messages: + msg130683
2011-03-12 14:49:48eric.araujosetnosy: tim.peters, georg.brandl, mark.dickinson, techtonik, benjamin.peterson, trentm, djc, eric.araujo, meatballhat, ysj.ray
messages: + msg130677
versions: + Python 3.3, - Python 3.2
2011-03-10 07:03:32ysj.raysetfiles: + issue_2142.diff
nosy: tim.peters, georg.brandl, mark.dickinson, techtonik, benjamin.peterson, trentm, djc, eric.araujo, meatballhat, ysj.ray
messages: + msg130491
2010-12-03 02:42:53eric.araujosetnosy: + georg.brandl, benjamin.peterson, djc, eric.araujo, - BreamoreBoy
messages: + msg123163
2010-10-07 17:43:11trentmsetmessages: + msg118125
2010-10-06 22:15:40trentmsetassignee: trentm
2010-07-20 05:10:47ysj.raysetmessages: + msg110866
2010-07-20 04:22:01ysj.raysetfiles: + issue_2142_py3k_updated.diff

messages: + msg110864
2010-07-19 16:43:05BreamoreBoysetnosy: + BreamoreBoy
messages: + msg110778
2010-05-14 21:08:47mark.dickinsonsettype: behavior -> enhancement
messages: + msg105762
versions: - Python 2.6, Python 3.1, Python 2.7
2010-05-14 09:13:55mark.dickinsonsetnosy: + tim.peters
messages: + msg105693
2010-05-14 08:54:51ysj.raysetfiles: + issue_2142_py3k.diff
2010-05-14 08:54:18ysj.raysetfiles: + issue_2142_trunk.diff
nosy: + ysj.ray
messages: + msg105692

2010-05-14 01:43:38meatballhatsetnosy: + meatballhat
2010-05-13 17:20:18mark.dickinsonsetmessages: + msg105647
2010-05-13 16:03:08techtoniksetmessages: + msg105639
2010-05-13 16:01:19techtoniksetmessages: + msg105638
title: naive use of ''.join(difflib.unified_diff(...)) results in bogus diffs with inputs that don't end with end-of-line char (same with context_diff) -> difflib.unified_diff(...) produces invalid patches
2010-05-13 14:55:34mark.dickinsonsetversions: + Python 2.7, Python 3.2
2010-05-13 14:55:19mark.dickinsonsetnosy: + mark.dickinson, techtonik
messages: + msg105636
2010-05-13 14:54:50mark.dickinsonlinkissue8702 superseder
2009-05-26 16:50:14trentmsetfiles: + python_difflib_no_eol.patch

title: naive use of ''.join(difflib.unified_diff(...)) results in bogus diffs with inputs that don't end with end-of-line char -> naive use of ''.join(difflib.unified_diff(...)) results in bogus diffs with inputs that don't end with end-of-line char (same with context_diff)
messages: + msg88375
stage: test needed -> patch review
2009-05-12 14:09:32ajaksu2setstage: test needed
versions: + Python 3.1, - Python 2.5, Python 2.4, Python 2.3
2008-02-19 09:09:53christian.heimessetpriority: normal
keywords: + patch
2008-02-18 20:25:22trentmsetmessages: + msg62545
2008-02-18 20:24:08trentmsetfiles: + python_difflib_unified_diff.patch
messages: + msg62544
versions: + Python 2.6, Python 2.4, Python 2.3
2008-02-18 20:16:55trentmcreate