Author eli.bendersky
Recipients LambertDW, eli.bendersky, georg.brandl, ggenellina, gjb1002, hagna, janpf, jimjjewett, mrotondo, pitrou, r.david.murray, rtvd, sjmachin, terry.reedy, tim.peters, vbr
Date 2010-07-02.07:16:07
SpamBayes Score 0.0533329
Marked as misclassified No
Message-id <1278054971.02.0.516847225419.issue2986@psf.upfronthosting.co.za>
In-reply-to
Content
The new "junk heuristic" has been added to difflib.py in SVN revision 26661 in 2002 (which is, incidentally, the last revision to modify difflib.py). Its commit log says:

---------------------------------------------
Mostly in SequenceMatcher.{__chain_b, find_longest_match}:
This now does a dynamic analysis of which elements are so frequently
repeated as to constitute noise.  The primary benefit is an enormous
speedup in find_longest_match, as the innermost loop can have factors
of 100s less potential matches to worry about, in cases where the
sequences have many duplicate elements.  In effect, this zooms in on
sequences of non-ubiquitous elements now.

While I like what I've seen of the effects so far, I still consider
this experimental.  Please give it a try!
---------------------------------------------
History
Date User Action Args
2010-07-02 07:16:11eli.benderskysetrecipients: + eli.bendersky, tim.peters, georg.brandl, terry.reedy, jimjjewett, sjmachin, gjb1002, ggenellina, pitrou, rtvd, vbr, LambertDW, hagna, r.david.murray, janpf, mrotondo
2010-07-02 07:16:11eli.benderskysetmessageid: <1278054971.02.0.516847225419.issue2986@psf.upfronthosting.co.za>
2010-07-02 07:16:09eli.benderskylinkissue2986 messages
2010-07-02 07:16:08eli.benderskycreate