Message 79721 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ggenellina
Recipients	amaury.forgeotdarc, ggenellina, pratik.potnis
Date	2009-01-13.06:38:03
SpamBayes Score	0.27641964
Marked as misclassified	No
Message-id	<1231828687.13.0.319172613104.issue4889@psf.upfronthosting.co.za>
In-reply-to

Content
You (as a human) most likely parse these lines: hostname vaijain123 hostname CAVANC1001CR1 as "two words, the first one is the same, the second word changed". But difflib sees them more or less as: "21 letters, 8 of them are the same, 13 are different". There are many more differences than matches, so it makes sense to show the changes as a complete replacement: >>> d = difflib.ndiff(["hostname vaijain123\n"], ["hostname CAVANC1001CR1\n"]) >>> print ''.join(d) - hostname vaijain123 + hostname CAVANC1001CR1 It has nothing to do with upper or lower case letters ("A" and "a" are completely different things for difflib). If the names were shorter, it might consider a match: >>> d = difflib.ndiff(["hostname vai\n"], ["hostname CAV\n"]) >>> print ''.join(d) - hostname vai ? ^^^ + hostname CAV ? ^^^ Note how the ratio changes: >>> difflib.SequenceMatcher(None, "hostname vaijain123", "hostname CAVANC1001CR1").ratio() 0.48780487804878048 >>> difflib.SequenceMatcher(None, "hostname vai", "hostname CAV").ratio () 0.75 The ratio must be 0.75 or higher for a differ to consider two lines "close enough" to show intra-line differences.

You (as a human) most likely parse these lines:

hostname vaijain123
hostname CAVANC1001CR1

as "two words, the first one is the same, the second word changed".
But difflib sees them more or less as: "21 letters, 8 of them are the 
same, 13 are different". There are many more differences than matches, 
so it makes sense to show the changes as a complete replacement:

>>> d = difflib.ndiff(["hostname vaijain123\n"], ["hostname 
CAVANC1001CR1\n"])
>>> print ''.join(d)
- hostname vaijain123
+ hostname CAVANC1001CR1

It has nothing to do with upper or lower case letters ("A" and "a" are 
completely different things for difflib). If the names were shorter, it 
might consider a match:

>>> d = difflib.ndiff(["hostname vai\n"], ["hostname CAV\n"])
>>> print ''.join(d)
- hostname vai
?          ^^^
+ hostname CAV
?          ^^^

Note how the ratio changes:

>>> difflib.SequenceMatcher(None, "hostname vaijain123", "hostname 
CAVANC1001CR1").ratio()
0.48780487804878048
>>> difflib.SequenceMatcher(None, "hostname vai", "hostname CAV").ratio
()
0.75

The ratio must be 0.75 or higher for a differ to consider two lines 
"close enough" to show intra-line differences.

History
Date	User	Action	Args
2009-01-13 06:38:07	ggenellina	set	recipients: + ggenellina, amaury.forgeotdarc, pratik.potnis
2009-01-13 06:38:07	ggenellina	set	messageid: <1231828687.13.0.319172613104.issue4889@psf.upfronthosting.co.za>
2009-01-13 06:38:05	ggenellina	link	issue4889 messages
2009-01-13 06:38:03	ggenellina	create