msg62543 - (view) |
Author: Trent Mick (trentm) * |
Date: 2008-02-18 20:16 |
When comparing content with difflib, if the resulting diff covers the
last line of one or both of the inputs that that line doesn't end with
an end-of-line character(s), then the generated diff lines don't include
an EOL. Fair enough.
Naive (and I suspect typical) usage of difflib.unified_diff(...) is:
diff = ''.join(difflib.unified_diff(...))
This results in an *incorrect* unified diff for the conditions described
above.
>>> from difflib import *
>>> gen = unified_diff("one\ntwo\nthree".splitlines(1),
... "one\ntwo\ntrois".splitlines(1))
>>> print ''.join(gen)
---
+++
@@ -1,3 +1,3 @@
one
two
-three+trois
The proper behaviour would be:
>>> gen = unified_diff("one\ntwo\nthree".splitlines(1),
... "one\ntwo\ntrois".splitlines(1))
>>> print ''.join(gen)
---
+++
@@ -1,3 +1,3 @@
one
two
-three
\ No newline at end of file
+trois
\ No newline at end of file
I *believe* that "\ No newline at end of file" are the appropriate
markers -- that tools like "patch" will know how to use. At least this
is what "svn diff" generates.
I'll try to whip up a patch.
Do others concur that this should be fixed?
|
msg62544 - (view) |
Author: Trent Mick (trentm) * |
Date: 2008-02-18 20:24 |
Attached is a patch against the Python 2.6 svn trunk for this.
|
msg62545 - (view) |
Author: Trent Mick (trentm) * |
Date: 2008-02-18 20:25 |
At a glance I suspect this patch will work back to Python 2.3 (when
difflib.unified_diff() was added). I haven't looked at the Py3k tree yet.
Note: This *may* also applied to difflib.context_diff(), but I am not sure.
|
msg88375 - (view) |
Author: Trent Mick (trentm) * |
Date: 2009-05-26 16:50 |
Here is a new patch that also fixes the same issue in
difflib.context_diff() and adds a couple test cases.
|
msg105636 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2010-05-13 14:55 |
Closed 8702 as a duplicate of this one. Combining nosy lists.
|
msg105638 - (view) |
Author: anatoly techtonik (techtonik) |
Date: 2010-05-13 16:01 |
You mean that this patch sits in here for 2 years already? This suxx.
And nobody added the tag 'easy', because people haven't explicitly requested tracker privileges? That suxx.
And it won't be in 2.7 because of that.. I am disappointed.
Why this issue is not labeled as 'difflib' related? How are people supposed to find this one?
|
msg105639 - (view) |
Author: anatoly techtonik (techtonik) |
Date: 2010-05-13 16:03 |
Trent, the "\ New line..." seems to be feature specific to unified format only. http://en.wikipedia.org/wiki/Diff#Unified_format
|
msg105647 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2010-05-13 17:20 |
Hmm. Not in my version of diff: it seems to use '\ No newline at end of file' for both unified and 'normal' diffs.
$ diff --version
diff (GNU diffutils) 2.8.1
Copyright (C) 2002 Free Software Foundation, Inc.
...
$ diff 1.txt 2.txt
2c2
< two
\ No newline at end of file
---
> two
|
msg105692 - (view) |
Author: ysj.ray (ysj.ray) |
Date: 2010-05-14 08:54 |
This is really a bug, but why it's not fixed during such a long time?
Since trentm's python_difflib_no_eol.patch patch failed against the current trunk, I modified it to work again, also a patch against py3k.
|
msg105693 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2010-05-14 09:13 |
I think it's arguable whether this is a bug or not. There's no official specification for the unified diff format that I can find anywhere; the GNU description at
http://www.gnu.org/software/hello/manual/diff/Detailed-Unified.html#Detailed-Unified
doesn't mention this detail. The '\ No newline at end of file' is actually unnecessary for these Python functions, since they operate on lists and produce a generator, so it would be needless complication. And changing this might break existing Python code that manually parses the output of unified_diff or context_diff and doesn't know what to do with a leading '\' character. (Does such Python code exist? I don't know.)
I'd suggest adding a keyword argument to the unified_diff and context_diff functions to enable this feature, leaving it disabled by default.
|
msg105762 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2010-05-14 21:08 |
Marking as a feature request: I think this is somewhere on the line between bugfix and new feature, but either way 2.7 is too close to release to start messing with something that isn't really all that broken.
For 2.7 (and 2.6, 3.1), there's also the option of updating the documentation to point out this issue. It would also be worth updating the command-line-interface example in the docs to do this post-processing (i.e., for each line that doesn't end in a newline, add '\n\ No newline at end of file') before calling writelines.
|
msg110778 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2010-07-19 16:43 |
The two latest patch files were missing TestDifflib from run_unittest. Having shown that the tests failed if test_difflib only was patched, then patch difflib against the 3.1 maintainance and py3k branches, the former passed ok and the latter failed. Could someone please take a look, at the same time considering Mark D's suggestion about using a keyword argument to the unified_diff and context_diff functions.
|
msg110864 - (view) |
Author: ysj.ray (ysj.ray) |
Date: 2010-07-20 04:22 |
Firstly, since this is marked as feature request, I updated the patch against py3k, to fix the problems found by BreamoreBoy.
|
msg110866 - (view) |
Author: ysj.ray (ysj.ray) |
Date: 2010-07-20 05:10 |
I feel adding a keyword argument to deal with such a case is not quiet worthy. I'd prefer not fix it for 2.x, and make its behavior consist with svn diff and other similar tools, as trentm said, "tools like "patch" will know how to use."
|
msg118125 - (view) |
Author: Trent Mick (trentm) * |
Date: 2010-10-07 17:43 |
c.f. some discussion on python-dev here:
http://mail.python.org/pipermail/python-dev/2010-October/104501.html
|
msg123163 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2010-12-03 02:42 |
To sum up the current consensus as I understand it, the missing end of line is clearly a bug, but the addition of “\No newline at end of file” is controversial. In the current patch, both are changed, so backporting only the first fix to 3.1 and 2.7 will require either agreeing to add “\No newline etc” in those versions too, or make another patch for 3.1/2.7.
I uploaded the patch to http://codereview.appspot.com/3432041/ and made some minor comments.
|
msg130491 - (view) |
Author: ysj.ray (ysj.ray) |
Date: 2011-03-10 07:03 |
Updated patch which can apply to current py3k cleanly and with changes follow eric's review comments.
|
msg130677 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-03-12 14:49 |
Patch does not look bad. I re-read the discussion on python-dev, where it was suggested to add a keyword argument to get the old behavior. Have you considered it?
|
msg130683 - (view) |
Author: ysj.ray (ysj.ray) |
Date: 2011-03-12 15:28 |
> I re-read the discussion on python-dev, where it was suggested to add a keyword argument to get the old behavior. Have you considered it?
IIUC, at the time of that discusstion, 3.2 is pre-beta so the suitable option is to add "\No newline etc" by default and add a keyword to get the old behavior, but now it is for 3.3. I wonder if 3.3 is a chance to totally get rid of the *OLD WRONG* behavior.
|
msg130801 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-03-14 10:51 |
I’m not sure whether the compatibility policy would prevent that. Aren’t there some cases with iterators (not strings) that work fine today without the change? If so, it should be possible to get the same behavior.
|
msg130959 - (view) |
Author: ysj.ray (ysj.ray) |
Date: 2011-03-15 05:55 |
Yes there may be.
Here is the updated patch:
Add a new keyword argument to context_diff() and unified_diff() named "warn_no_newline_at_end". If true, emit "\ No newline etc". It defaults to True, set it to false to get the old behavior. I'm not sure if this name is too long, but I haven't got a better name.
|
msg236541 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2015-02-24 20:26 |
Can we have a review on the updated patch please.
|
msg384192 - (view) |
Author: Paul "TBBle" Hampson (TBBle) |
Date: 2021-01-01 17:46 |
I just bounced off this issue and proposed a work-around for it in Black (https://github.com/psf/black/pull/1897).
Since it wasn't mentioned here earlier, the `\` marker _is_ documented in the GNU Diffutils documentation, under "Incomplete Lines" (https://www.gnu.org/software/diffutils/manual/html_node/Incomplete-Lines.html#Incomplete-Lines), and applies to both Context and Normal diff output formats.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:30 | admin | set | github: 46395 |
2021-02-18 19:42:20 | djc | set | nosy:
- djc
|
2021-02-18 17:52:07 | mark.dickinson | set | nosy:
- mark.dickinson
|
2021-01-01 17:46:45 | TBBle | set | nosy:
+ TBBle messages:
+ msg384192
|
2020-10-28 11:35:02 | jwilk | set | nosy:
+ jwilk
|
2018-10-23 11:55:15 | thatiparthy | set | nosy:
+ thatiparthy
|
2018-10-11 17:57:24 | BreamoreBoy | set | nosy:
- BreamoreBoy
|
2018-10-11 13:04:53 | puxlit | set | nosy:
+ puxlit
|
2015-11-04 08:54:37 | abacabadabacaba | set | nosy:
+ abacabadabacaba
|
2015-02-24 20:26:42 | BreamoreBoy | set | nosy:
+ BreamoreBoy
messages:
+ msg236541 versions:
+ Python 3.5, - Python 3.3 |
2011-03-15 05:55:06 | ysj.ray | set | files:
+ issue_2142_2.diff nosy:
tim.peters, georg.brandl, mark.dickinson, techtonik, benjamin.peterson, trentm, djc, eric.araujo, meatballhat, ysj.ray messages:
+ msg130959
|
2011-03-14 10:51:08 | eric.araujo | set | nosy:
tim.peters, georg.brandl, mark.dickinson, techtonik, benjamin.peterson, trentm, djc, eric.araujo, meatballhat, ysj.ray messages:
+ msg130801 |
2011-03-12 15:28:19 | ysj.ray | set | nosy:
tim.peters, georg.brandl, mark.dickinson, techtonik, benjamin.peterson, trentm, djc, eric.araujo, meatballhat, ysj.ray messages:
+ msg130683 |
2011-03-12 14:49:48 | eric.araujo | set | nosy:
tim.peters, georg.brandl, mark.dickinson, techtonik, benjamin.peterson, trentm, djc, eric.araujo, meatballhat, ysj.ray messages:
+ msg130677 versions:
+ Python 3.3, - Python 3.2 |
2011-03-10 07:03:32 | ysj.ray | set | files:
+ issue_2142.diff nosy:
tim.peters, georg.brandl, mark.dickinson, techtonik, benjamin.peterson, trentm, djc, eric.araujo, meatballhat, ysj.ray messages:
+ msg130491
|
2010-12-03 02:42:53 | eric.araujo | set | nosy:
+ georg.brandl, benjamin.peterson, djc, eric.araujo, - BreamoreBoy messages:
+ msg123163
|
2010-10-07 17:43:11 | trentm | set | messages:
+ msg118125 |
2010-10-06 22:15:40 | trentm | set | assignee: trentm |
2010-07-20 05:10:47 | ysj.ray | set | messages:
+ msg110866 |
2010-07-20 04:22:01 | ysj.ray | set | files:
+ issue_2142_py3k_updated.diff
messages:
+ msg110864 |
2010-07-19 16:43:05 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages:
+ msg110778
|
2010-05-14 21:08:47 | mark.dickinson | set | type: behavior -> enhancement messages:
+ msg105762 versions:
- Python 2.6, Python 3.1, Python 2.7 |
2010-05-14 09:13:55 | mark.dickinson | set | nosy:
+ tim.peters messages:
+ msg105693
|
2010-05-14 08:54:51 | ysj.ray | set | files:
+ issue_2142_py3k.diff |
2010-05-14 08:54:18 | ysj.ray | set | files:
+ issue_2142_trunk.diff nosy:
+ ysj.ray messages:
+ msg105692
|
2010-05-14 01:43:38 | meatballhat | set | nosy:
+ meatballhat
|
2010-05-13 17:20:18 | mark.dickinson | set | messages:
+ msg105647 |
2010-05-13 16:03:08 | techtonik | set | messages:
+ msg105639 |
2010-05-13 16:01:19 | techtonik | set | messages:
+ msg105638 title: naive use of ''.join(difflib.unified_diff(...)) results in bogus diffs with inputs that don't end with end-of-line char (same with context_diff) -> difflib.unified_diff(...) produces invalid patches |
2010-05-13 14:55:34 | mark.dickinson | set | versions:
+ Python 2.7, Python 3.2 |
2010-05-13 14:55:19 | mark.dickinson | set | nosy:
+ mark.dickinson, techtonik messages:
+ msg105636
|
2010-05-13 14:54:50 | mark.dickinson | link | issue8702 superseder |
2009-05-26 16:50:14 | trentm | set | files:
+ python_difflib_no_eol.patch
title: naive use of ''.join(difflib.unified_diff(...)) results in bogus diffs with inputs that don't end with end-of-line char -> naive use of ''.join(difflib.unified_diff(...)) results in bogus diffs with inputs that don't end with end-of-line char (same with context_diff) messages:
+ msg88375 stage: test needed -> patch review |
2009-05-12 14:09:32 | ajaksu2 | set | stage: test needed versions:
+ Python 3.1, - Python 2.5, Python 2.4, Python 2.3 |
2008-02-19 09:09:53 | christian.heimes | set | priority: normal keywords:
+ patch |
2008-02-18 20:25:22 | trentm | set | messages:
+ msg62545 |
2008-02-18 20:24:08 | trentm | set | files:
+ python_difflib_unified_diff.patch messages:
+ msg62544 versions:
+ Python 2.6, Python 2.4, Python 2.3 |
2008-02-18 20:16:55 | trentm | create | |