Message 225134 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	Ankur.Ankan, Elena.Oat, Jacek.Bzdak, Puneeth.Chaganti, ankurankan, ezio.melotti, michael.foord, nnja, pitrou, serhiy.storchaka, vstinner
Date	2014-08-10.09:53:50
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<3659375.UCUqjatPJT@raxxla>
In-reply-to	<1407663002.65.0.0356318367991.issue19217@psf.upfronthosting.co.za>

Content
> 1) try to have a single threshold for all types, and use line-based counting > for strings (so if the threshold is 32, this means 32 elements in a list, > 32 items in a dict, 32 lines in a string); You forgot about strings with few but very long lines. We should hide or truncate too long lines, and this is not trivial issue. Actually we should more control on difflib's machinery and use something like _common_shorten_repr to appropriate truncate similar lines. > Option a) might be doable, and even if it introduces a change in behavior it > might be acceptable since it affects the output of the messages in case of > failure, and I don't think anyone is relying on an exact output (also > because tests shouldn't be failing). Moreover, the most common usage of > maxDiff is setting it to None, and having the threshold to None means that > the full diff will be computed and printed, leaving the behavior unchanged. This is too much for bug fix. We should fix this issue (do not calculate diffs between too long sequences) and preserve as much details as possible. Omitting the diff at all when it is outputted with current code (but very slowly) is a regression. It would be better to output truncated diff. Then we can refactor and improve diffs reporting in other issues.

> 1) try to have a single threshold for all types, and use line-based counting
> for strings (so if the threshold is 32, this means 32 elements in a list,
> 32 items in a dict, 32 lines in a string);

You forgot about strings with few but very long lines. We should hide or 
truncate too long lines, and this is not trivial issue. Actually we should 
more control on difflib's machinery and use something like _common_shorten_repr 
to appropriate truncate similar lines.

> Option a) might be doable, and even if it introduces a change in behavior it
> might be acceptable since it affects the output of the messages in case of
> failure, and I don't think anyone is relying on an exact output (also
> because tests shouldn't be failing).  Moreover, the most common usage of
> maxDiff is setting it to None, and having the threshold to None means that
> the full diff will be computed and printed, leaving the behavior unchanged.

This is too much for bug fix. We should fix this issue (do not calculate diffs 
between too long sequences) and preserve as much details as possible. Omitting 
the diff at all when it is outputted with current code (but very slowly) is a 
regression. It would be better to output truncated diff.

Then we can refactor and improve diffs reporting in other issues.

History
Date	User	Action	Args
2014-08-10 09:53:51	serhiy.storchaka	set	recipients: + serhiy.storchaka, pitrou, vstinner, ezio.melotti, michael.foord, Jacek.Bzdak, Ankur.Ankan, Elena.Oat, nnja, ankurankan, Puneeth.Chaganti
2014-08-10 09:53:51	serhiy.storchaka	link	issue19217 messages
2014-08-10 09:53:50	serhiy.storchaka	create