Author vbr
Recipients LambertDW, eli.bendersky, georg.brandl, ggenellina, gjb1002, hagna, janpf, jimjjewett, mrotondo, pitrou, r.david.murray, rtvd, sjmachin, terry.reedy, tim.peters, vbr
Date 2010-07-07.23:17:00
SpamBayes Score 0.000560167
Marked as misclassified No
Message-id <1278544623.2.0.0638716174364.issue2986@psf.upfronthosting.co.za>
In-reply-to
Content
I guess, I am not supposed to post to python-dev - not being a python developer, hopefully it is appropriate to add a comment here - only based on my current usage of (a modified) difflib.SequenceMatcher.
It seems, the mentions of text comparison in that thread, e.g. 
http://mail.python.org/pipermail/python-dev/2010-July/101515.html
etc. rather imply line-by-line comparison, and possibly character comparison of matched lines.
For me the direct character-wise comparison is more useful in most cases.
With the popular heuristics disabled the results look pretty well.
(the script only involves changing the background colour of the compared texts - based on the SequenceMatcher - get_opcodes() )
Just now, I only need to disable the popular check, currently I use a monkey-patched subclass of SequenceMatcher with extended signature and modified __chain_b function.
cf. http://mail.python.org/pipermail/python-list/2010-June/1247907.html

I would vote for extending the SequenceMatcher API to enable adjustments (leaving the default values as the current ones) - enable/disable popular check, set the thresholds for string length and "popular" frequency (and eventually other parameters, which might be added).

Are there some restrictions on API changes in a library due to a moratorium - even if the default behaviour remains unchanged?
Otherwise, what might be the disadvantages of this approach?
If the current behaviour is considered appropriate for the original usecases, other uses would be also made possible/easier - only at the cost of learning the meaning of the added parameters - from the enhanced docs, of course.

vbr
History
Date User Action Args
2010-07-07 23:17:04vbrsetrecipients: + vbr, tim.peters, georg.brandl, terry.reedy, jimjjewett, sjmachin, gjb1002, ggenellina, pitrou, rtvd, LambertDW, hagna, r.david.murray, eli.bendersky, janpf, mrotondo
2010-07-07 23:17:03vbrsetmessageid: <1278544623.2.0.0638716174364.issue2986@psf.upfronthosting.co.za>
2010-07-07 23:17:01vbrlinkissue2986 messages
2010-07-07 23:17:00vbrcreate