classification
Title: unified_diff function product incorrect range information
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: jan.koprowski, python-dev, rhettinger, terry.reedy, tim.peters, ysj.ray
Priority: high Keywords: patch

Created on 2011-04-03 04:57 by jan.koprowski, last changed 2011-04-12 22:52 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
issue_11747.diff ysj.ray, 2011-04-05 08:01 review
Diff_Format.pdf rhettinger, 2011-04-09 19:50 Single Unix Specification for diff output formats
Messages (11)
msg132832 - (view) Author: Jan Koprowski (jan.koprowski) Date: 2011-04-03 04:57
Python:
---------------------------
>>> import difflib
>>> dl = difflib.unified_diff([], ['a\n', 'b\n'])
>>> print ''.join(dl),
---
+++
@@ -1,0 +1,2 @@
+a
+b

Gnu diff:
---------------------------
$diff -uN a b
--- a   1970-01-01 01:00:00.000000000 +0100
+++ b   2011-04-03 06:56:28.330543000 +0200
@@ -0,0 +1,2 @@
+a
+b
msg133007 - (view) Author: ysj.ray (ysj.ray) Date: 2011-04-05 08:01
Since if one of the two comparing files is empty, gnu diff regards the beginning line of differences as line 0 (there is not any lines), but difflib regards it as line 1(there is a line, but empty). Not sure weather is correct since the practice usage of diff output is feeding it to "patch" program which determine the different line location mostly based on context identical lines instead of the line numbers in the hunk headers, so it doesn't matter weather it's line 0 or line 1. But it is still better to keep consist with gnu diff.

In context_diff() it is correct since if there is less then 2 different lines in a hunk, only ending line number is display in hunk header.

Here is a patch which fix this.
msg133345 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-04-08 22:22
@Tim: was it your intention that difflib track gnu diff?

I am on the fence with this issue. Without input from Tim other than the doc, I am tempted to call this a feature request and retitle it "Make unified_diff match gnu diff for [] input". The docs do not reference external definitions for context-diff and unified-diff. The entry for unified-diff does not give a format for the @@ control lines. So the current behavior cannot be said to violate the doc spec.

On the other hand, putting random garbage on @@ lines would clearly be a bug, so the doc must be taken as referencing some external definition, even if vague.
 
https://secure.wikimedia.org/wikipedia/en/wiki/Diff#Unified_format
says "Unified context diffs were originally developed by Wayne Davison in August 1990 (in unidiff which appeared in Volume 14 of comp.sources.misc). Richard Stallman added unified diff support to the GNU Project's diff utility one month later,". So using gnu diff as a standard is not unreasonable. (But did it give 0,0 for null files from the beginning?)

Now looking practically. If nothing but patch programs ever read and act on the control blocks, then, say Ray, a change would neither hurt nor be of any use. If any Python code does look, then a change could break code. I would in any case be reluntant to change 2.7. 

But if this is treated as a feature request and only 3.3 swere changed, then we would have to document the change:
"Version changed 3.3: for empty lists, the @@ block specification was changed from 1,0 to 0,0", which is pretty close to adding useless noise.
3.0 would have been the best time to make this change.
msg133373 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2011-04-09 03:20
Terry, I had no intention here at all - had nothing to do with unified_diff. Would have to look at the history to see who added it, and ask them.  That said, the very name "unified_diff" suggests someone did intend to mimic _some_ system's "unified diff" behavior ;-)
msg133416 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-09 19:50
[Uncle Timmy]
> Would have to look at the history to see who added it, and ask them. 

That would be me :-)

At the time, the goals were to:

1) make an easy-to-use, readable output format for file comparisons,

2) use the unmodified output of the existing SequenceMatcher(None,a,b).get_grouped_opcodes(n) method,

3) create output that works with patch and ed, and

4) comply with the output format spec in the Single Unix Specification found at http://www.unix.org/single_unix_specification/ . See the attached excerpt.

No effort was made to exactly reproduce the output of GNU diff.  It was just an alternate output format for the SequenceMatcher.
msg133419 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-04-09 20:36
Thanks, Raymond. That file says (in the -u section) "If a range is empty, its beginning line number shall be the number of the line just before the range, or 0 if the empty range starts the file." The last clause says to me that gnu diff is right and that the 1,0 range of difflib.unified_diff is a bug.

I think we should add this link to the difflib doc (I am still thinking about place and wording).

News entry might be
Issue # 11747: Correct difflib.unified_diff empty file range from 1,0 to 0,0 in conformance with Single Unix Specification for diff output formats. Patch by ysj.ray.
msg133490 - (view) Author: Roundup Robot (python-dev) Date: 2011-04-11 00:24
New changeset 36648097fcd4 by Raymond Hettinger in branch '3.2':
Cleanup and modernize code prior to working on Issue 11747.
http://hg.python.org/cpython/rev/36648097fcd4

New changeset 58a3bfcc70f7 by Raymond Hettinger in branch 'default':
Cleanup and modernize code prior to working on Issue 11747.
http://hg.python.org/cpython/rev/58a3bfcc70f7
msg133541 - (view) Author: Roundup Robot (python-dev) Date: 2011-04-11 20:11
New changeset a2ee967de44f by Raymond Hettinger in branch '3.2':
Issue #11747: Fix range formatting in context and unified diffs.
http://hg.python.org/cpython/rev/a2ee967de44f

New changeset 1e5e3bb3e1f1 by Raymond Hettinger in branch 'default':
Issue #11747: Fix range formatting in context and unified diffs.
http://hg.python.org/cpython/rev/1e5e3bb3e1f1
msg133546 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-11 21:53
Re-opening.  There are some problems with the fix.  Context diff ranges need to show the ending line number, not the length.  Also for unified diffs, GNU diff is showing (x,0) as just (x).
msg133609 - (view) Author: Roundup Robot (python-dev) Date: 2011-04-12 22:26
New changeset 707078ca0a77 by Raymond Hettinger in branch '3.1':
Issue 11747: Fix output format for context diffs.
http://hg.python.org/cpython/rev/707078ca0a77

New changeset e3387295a24f by Raymond Hettinger in branch '3.2':
Issue 11747: Fix output format for context diffs.
http://hg.python.org/cpython/rev/e3387295a24f

New changeset fbfd5435889c by Raymond Hettinger in branch 'default':
Issue 11747: Fix output format for context diffs.
http://hg.python.org/cpython/rev/fbfd5435889c
msg133610 - (view) Author: Roundup Robot (python-dev) Date: 2011-04-12 22:48
New changeset 09459397f807 by Raymond Hettinger in branch '2.7':
Issue 11747: Fix output format for context diffs.
http://hg.python.org/cpython/rev/09459397f807
History
Date User Action Args
2011-04-12 22:52:22rhettingersetstatus: open -> closed
resolution: fixed
2011-04-12 22:48:35python-devsetmessages: + msg133610
2011-04-12 22:26:11python-devsetmessages: + msg133609
2011-04-11 21:53:39rhettingersetstatus: closed -> open
priority: low -> high
resolution: fixed -> (no value)
messages: + msg133546
2011-04-11 21:01:10rhettingersetstatus: open -> closed
resolution: fixed
2011-04-11 20:11:17python-devsetmessages: + msg133541
2011-04-11 00:24:42python-devsetnosy: + python-dev
messages: + msg133490
2011-04-09 20:36:02terry.reedysetmessages: + msg133419
2011-04-09 20:19:44rhettingersetpriority: normal -> low
assignee: rhettinger
2011-04-09 19:50:42rhettingersetfiles: + Diff_Format.pdf
nosy: + rhettinger
messages: + msg133416

2011-04-09 03:20:47tim.peterssetmessages: + msg133373
2011-04-08 22:22:41terry.reedysetnosy: + terry.reedy, tim.peters

messages: + msg133345
stage: patch review
2011-04-05 08:01:17ysj.raysetfiles: + issue_11747.diff

type: behavior
versions: + Python 3.1, Python 3.2, Python 3.3
keywords: + patch
nosy: + ysj.ray

messages: + msg133007
2011-04-03 04:57:15jan.koprowskicreate