classification
Title: difflib: unified_diff produces wrong patches (again)
Type: enhancement Stage: test needed
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: difflib.unified_diff(...) produces invalid patches
View: 2142
Assigned To: Nosy List: mark.dickinson, techtonik
Priority: normal Keywords: easy

Created on 2010-05-13 08:05 by techtonik, last changed 2010-05-13 14:54 by mark.dickinson. This issue is now closed.

Messages (3)
msg105630 - (view) Author: anatoly techtonik (techtonik) Date: 2010-05-13 08:05
If source/target file for unified format diff context doesn't end with new line, the diff should contain this marker:

\ No newline at end of file

Or else there is information loss when such patch is applied.

http://en.wikipedia.org/wiki/Diff#Unified_format
msg105634 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-13 14:14
I think difflib is behaving as intended here; changing to feature request.

Could you please clarify about the information loss?  I'm not seeing it.  As far as I can tell, the fact that unified_diff produces a list rather than a single string (as GNU diff effectively does) means that all necessary information about newlines is preserved, with no information loss:

newton:py3k dickinsm$ echo -n "one
two" > 1.txt
newton:py3k dickinsm$ echo -n "one
two         
" > 2.txt
newton:py3k dickinsm$ ./python.exe
Python 3.2a0 (py3k:81084:81085M, May 12 2010, 14:16:52) 
[GCC 4.2.1 (Apple Inc. build 5659)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from difflib import unified_diff
[47745 refs]
>>> list(unified_diff(list(open('1.txt')), list(open('2.txt'))))
['--- \n', '+++ \n', '@@ -1,2 +1,2 @@\n', ' one\n', '-two', '+two\n']
[53249 refs]

It looks to me as though the diff picks up the missing newline just fine.

The one problem with the above is that you can't do a ''.join() on it to give a meaningful diff, but I don't see that as a problem with the unified_diff function itself.

I'd be -1 on adding the "\ No newline at end of file" by default, since it complicates the unified_diff format unnecessarily (and would also affect backwards compatibility).  I wouldn't have any objections to an extra option for this, though.
msg105635 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-13 14:54
It turns out that this problem was already reported in issue 2142 (which has a patch);  closing as a duplicate.
History
Date User Action Args
2010-05-13 14:54:50mark.dickinsonsetstatus: open -> closed
resolution: duplicate
superseder: difflib.unified_diff(...) produces invalid patches
messages: + msg105635
2010-05-13 14:14:56mark.dickinsonsetnosy: + mark.dickinson
messages: + msg105634

type: behavior -> enhancement
stage: test needed
2010-05-13 12:25:18r.david.murraysetkeywords: + easy
type: behavior
versions: - Python 3.3
2010-05-13 08:05:25techtonikcreate