Author rbcollins
Recipients Ankur.Ankan, Elena.Oat, Jacek.Bzdak, Puneeth.Chaganti, ankurankan, ezio.melotti, michael.foord, nnja, pitrou, rbcollins, serhiy.storchaka, vstinner
Date 2014-10-23.07:13:22
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1414048403.77.0.615668859142.issue19217@psf.upfronthosting.co.za>
In-reply-to
Content
A few thoughts.

Adding a new public symbol seems inappropriate here: this is a performance issue that is well predictable and we should cater for that (given difflibs current performance).

I'll note in passing that both bzr and hg have much higher performance difference algorithms that we could pick up and includes as a replacement SequenceMatcher, which might significantly reduce the threshold at which we need to default-cap things - but such a threshold will still exist.

I totally agree that _diffThreshold should apply to non-string sequences - anything where we're going to hit high-order complexity outputting the difference. That said, I speculate that perhaps we'd be better off outputting both objects in some structured fashion and letting a later process render them (for things like CI systems and test databases, where fidelity of reproduction is more important than having the output fit on one screen. This is a different issue though and something we should revisit later.

That suggests to me though that the largest diff we output should be chosen based on the textual representation of the diff - we're doing it for human readability. Whereas the threshold for calculating a diff at all should be based on performance. It can be very expensive to calculate a diff on large sequences, but the diff might be much much larger than the sequence length indicates [because each item in the sequence may be very large]. Perhaps thats over thinking it?

Anyhow- short term, surely just making the threshold apply to any sequenced type is sufficient to fix the bug?
History
Date User Action Args
2014-10-23 07:13:23rbcollinssetrecipients: + rbcollins, pitrou, vstinner, ezio.melotti, michael.foord, serhiy.storchaka, Jacek.Bzdak, Ankur.Ankan, Elena.Oat, nnja, ankurankan, Puneeth.Chaganti
2014-10-23 07:13:23rbcollinssetmessageid: <1414048403.77.0.615668859142.issue19217@psf.upfronthosting.co.za>
2014-10-23 07:13:23rbcollinslinkissue19217 messages
2014-10-23 07:13:22rbcollinscreate