This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ggenellina
Recipients amaury.forgeotdarc, ggenellina, pratik.potnis
Date 2009-01-13.06:38:03
SpamBayes Score 0.27641964
Marked as misclassified No
Message-id <1231828687.13.0.319172613104.issue4889@psf.upfronthosting.co.za>
In-reply-to
Content
You (as a human) most likely parse these lines:

hostname vaijain123
hostname CAVANC1001CR1

as "two words, the first one is the same, the second word changed".
But difflib sees them more or less as: "21 letters, 8 of them are the 
same, 13 are different". There are many more differences than matches, 
so it makes sense to show the changes as a complete replacement:

>>> d = difflib.ndiff(["hostname vaijain123\n"], ["hostname 
CAVANC1001CR1\n"])
>>> print ''.join(d)
- hostname vaijain123
+ hostname CAVANC1001CR1

It has nothing to do with upper or lower case letters ("A" and "a" are 
completely different things for difflib). If the names were shorter, it 
might consider a match:

>>> d = difflib.ndiff(["hostname vai\n"], ["hostname CAV\n"])
>>> print ''.join(d)
- hostname vai
?          ^^^
+ hostname CAV
?          ^^^

Note how the ratio changes:

>>> difflib.SequenceMatcher(None, "hostname vaijain123", "hostname 
CAVANC1001CR1").ratio()
0.48780487804878048
>>> difflib.SequenceMatcher(None, "hostname vai", "hostname CAV").ratio
()
0.75

The ratio must be 0.75 or higher for a differ to consider two lines 
"close enough" to show intra-line differences.
History
Date User Action Args
2009-01-13 06:38:07ggenellinasetrecipients: + ggenellina, amaury.forgeotdarc, pratik.potnis
2009-01-13 06:38:07ggenellinasetmessageid: <1231828687.13.0.319172613104.issue4889@psf.upfronthosting.co.za>
2009-01-13 06:38:05ggenellinalinkissue4889 messages
2009-01-13 06:38:03ggenellinacreate