This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: difflib
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, ggenellina, jackdied, pratik.potnis
Priority: normal Keywords:

Created on 2009-01-09 05:49 by pratik.potnis, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
c1.ios pratik.potnis, 2009-01-09 05:49 This file contains above mentiones two strings. Save them in two different files and then take the diff of files.
Messages (4)
msg79455 - (view) Author: Pratik Potnis (pratik.potnis) Date: 2009-01-09 05:49
While using function HtmlDiff() from Library difflib, if there is
difference in caps of two strings it does not provide proper diff results.
Two strings in two different files in this context that I used are:
hostname vaijain123 and (this string is in small caps)
hostname CAVANC1001CR1 (This one is in large caps)

Expected behavior after diffing : It should show hostname changed (and
highlight it with Yellow color)

instead of this it is showing Added in one file and deleted in another
file. (Highlighting them with green and red color respectively)

When tried with same caps (either small or large) it shows expected
behavior(highlighting the strings in yellow color). Also with numbers it
works well.

I think its an issue with the CAPS of letters. difflib is not able to
differentiate between the caps of letters.
msg79457 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-01-09 08:47
Can you be more precise?
I tried to reproduce your problem, but I only get added/deleted chunks,
nothing in yellow.

Please include a script that shows what you did, and the result you
expected.
msg79721 - (view) Author: Gabriel Genellina (ggenellina) Date: 2009-01-13 06:38
You (as a human) most likely parse these lines:

hostname vaijain123
hostname CAVANC1001CR1

as "two words, the first one is the same, the second word changed".
But difflib sees them more or less as: "21 letters, 8 of them are the 
same, 13 are different". There are many more differences than matches, 
so it makes sense to show the changes as a complete replacement:

>>> d = difflib.ndiff(["hostname vaijain123\n"], ["hostname 
CAVANC1001CR1\n"])
>>> print ''.join(d)
- hostname vaijain123
+ hostname CAVANC1001CR1

It has nothing to do with upper or lower case letters ("A" and "a" are 
completely different things for difflib). If the names were shorter, it 
might consider a match:

>>> d = difflib.ndiff(["hostname vai\n"], ["hostname CAV\n"])
>>> print ''.join(d)
- hostname vai
?          ^^^
+ hostname CAV
?          ^^^

Note how the ratio changes:

>>> difflib.SequenceMatcher(None, "hostname vaijain123", "hostname 
CAVANC1001CR1").ratio()
0.48780487804878048
>>> difflib.SequenceMatcher(None, "hostname vai", "hostname CAV").ratio
()
0.75

The ratio must be 0.75 or higher for a differ to consider two lines 
"close enough" to show intra-line differences.
msg84224 - (view) Author: Jack Diederich (jackdied) * (Python committer) Date: 2009-03-26 21:30
closing, Garbriel's explanation is sufficient.
History
Date User Action Args
2022-04-11 14:56:43adminsetgithub: 49139
2009-03-26 21:30:56jackdiedsetstatus: open -> closed

nosy: + jackdied
messages: + msg84224

resolution: not a bug
2009-01-13 06:38:05ggenellinasetnosy: + ggenellina
messages: + msg79721
2009-01-09 08:47:42amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg79457
2009-01-09 05:49:38pratik.potniscreate