Author patena
Recipients patena, tim.peters
Date 2012-03-16.05:52:04
SpamBayes Score 2.77556e-16
Marked as misclassified No
Message-id <1331877125.66.0.103153393744.issue14332@psf.upfronthosting.co.za>
In-reply-to
Content
According to difflib.ndiff help, the optional linejunk argument is "A function that should accept a single string argument, and return true iff the string is junk."  Presumably the point is to ignore the junk lines in the comparison.  But the function doesn't appear to actually do this - in fact I haven't been able to make the linejunk argument change the output in any way. 

Expected difflib.ndiff behavior with no linejunk argument given:
 >>> test_lines_1 = ['# something\n', 'real data\n']
 >>> test_lines_2 = ['# something else\n', 'real data\n']
 >>> print ''.join(difflib.ndiff(test_lines_1,test_lines_2))
 - # something
 + # something else
 ?            +++++
   real data

Now I'm providing a linejunk function to ignore all lines starting with '#', but the output is still the same:
 >>> print ''.join(difflib.ndiff(test_lines_1, test_lines_2, 
                           linejunk=lambda line: line.startswith('#')))
 - # something
 + # something else
 ?            +++++
   real data

In fact if I make linejunk always return True (or False), nothing changes either:
 >>> print ''.join(difflib.ndiff(test_lines_1, test_lines_2, 
                                 linejunk=lambda line: True))
 - # something
 + # something else
 ?            +++++
   real data

It certainly looks like an error, although it's possible that I'm just misunderstanding how this should work.

I'm using Python 2.6.5, on Ubuntu Linux 10.04.
History
Date User Action Args
2012-03-16 05:52:05patenasetrecipients: + patena, tim.peters
2012-03-16 05:52:05patenasetmessageid: <1331877125.66.0.103153393744.issue14332@psf.upfronthosting.co.za>
2012-03-16 05:52:04patenalinkissue14332 messages
2012-03-16 05:52:04patenacreate