classification
Title: dreadful performance in difflib: ndiff and HtmlDiff
Type: performance Stage:
Components: Library (Lib) Versions: Python 2.6, Python 2.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: heidar.rafn (1)
Priority: Keywords

Created on 2009-09-17 14:54 by heidar.rafn, last changed 2009-09-17 15:01 by heidar.rafn.

Files
File name Uploaded Description Edit Remove
python.difflib.bug.tgz heidar.rafn, 2009-09-17 14:54 a gzipped tar with 6 testfiles - see comment above
Messages (1)
msg92768 - (view) Author: Heiðar Rafn Harðarson (heidar.rafn) Date: 2009-09-17 14:54
Relatively small set of lines with differences in most lines can destroy
the performance of difflib.HtmlDiff.make_table and difflib.ndiff.
I am using it like this:
    ...
    htmldiffer = HtmlDiff()
    return htmldiffer.make_table(src_lines, dst_lines, 
        fromdesc="file1",
        todesc="file2",
        context=True)

I have written the src_lines and dst_lins to files and tried this with
the Tools/scripts/diff.py wrapper with same results when using the
switches -m or -n.
The performance is fine when using difflib.unified_diff or switch -u on
diff.py

Attached are files that show this clearly.
left200.txt,right200.txt - 200 lines of text - duration 11 seconds.
left500.txt,right500.txt - 500 lines of text - duration 2min 58 sec
left1000.txt,right1000.txt - 1000 lines of text - duration 29min 4sec

tested on Intel dualcore T2500 2GHz with 2 GB of memory, python 2.5.2 on
Ubuntu. Same problom on python 2.6 on Fedora-11
For reference, the kdiff3 utility performs beautifully on these files.
History
Date User Action Args
2009-09-17 15:01:54heidar.rafnsettitle: awful performance in difflib: ndiff and HtmlDiff -> dreadful performance in difflib: ndiff and HtmlDiff
2009-09-17 14:54:57heidar.rafncreate