classification
Title: assertEqual memory issues with large text inputs
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.4, Python 3.3, Python 3.2, Python 2.7
process
Status: open Resolution: fixed
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: ezio.melotti, michael.foord, pitrou, python-dev, rhettinger, sara.magliacane, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2011-04-04 20:16 by michael.foord, last changed 2013-10-14 00:57 by ezio.melotti.

Files
File name Uploaded Description Edit
issue11763.diff ezio.melotti, 2011-04-04 22:59 Patch to add a _diffThreshold of 2**16 (2.7) review
issue11763-2.diff ezio.melotti, 2011-04-04 23:30 review
issue11763_safe_repr.diff sara.magliacane, 2011-06-25 13:33 review
issue11763-3.diff ezio.melotti, 2013-10-14 00:57 review
Messages (11)
msg132965 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2011-04-04 20:16
>>> s = "x" * (2**29)
>>> case.assertEqual(s + "a", s + "b")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/antoine/cpython/default/Lib/unittest/case.py", line 643,
in assertEqual assertion_func(first, second, msg=msg)
 File "/home/antoine/cpython/default/Lib/unittest/case.py", line 984,
in assertMultiLineEqual secondlines = [second + '\n']
MemoryError

assertEqual delegates to assertMultilineEqual for comparing text which uses difflib for comparisons. This has performance issues (as well as memory issues) for very large inputs, so should fallback to a simple comparison (or simpler diff generation technique) for very large inputs.
msg132986 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-04-04 22:59
Attached patch adds a _diffThreshold attribute of 2**16 and uses _baseAssertEqual whenever one of the two string is longer than 2**16 chars.
msg132987 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-04-04 23:01
Rather than hardwiring `self.addCleanup(lambda: setattr(self, '_diffThreshold', 2**16))`, you should retrieve the previous value.
msg134529 - (view) Author: Roundup Robot (python-dev) Date: 2011-04-27 07:21
New changeset 8dbf661c0a63 by Ezio Melotti in branch '2.7':
#11763: don't use difflib in TestCase.assertMultiLineEqual if the strings are too long.
http://hg.python.org/cpython/rev/8dbf661c0a63

New changeset 04e64f77c6c7 by Ezio Melotti in branch '3.1':
#11763: don't use difflib in TestCase.assertMultiLineEqual if the strings are too long.
http://hg.python.org/cpython/rev/04e64f77c6c7

New changeset b316019638df by Ezio Melotti in branch '3.2':
#11763: merge with 3.1.
http://hg.python.org/cpython/rev/b316019638df

New changeset df154c872b0c by Ezio Melotti in branch 'default':
#11763: merge with 3.2.
http://hg.python.org/cpython/rev/df154c872b0c
msg134530 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-04-27 07:30
I committed a slightly modified version of the patch, so the issue should be fixed now.

There are two related problems though:
1) difflib is used in other places, so those should probably be checked too;
2) _baseAssertEqual should check if the len of the msg (or the sum of the lengths of the two args) is greater than maxDiff and use safe_repr(arg, short=True) to avoid long diffs.
msg139061 - (view) Author: Sara Magliacane (sara.magliacane) Date: 2011-06-25 13:33
Here is a small patch addressing the second point of the last message, but the tests are still missing.
msg139080 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2011-06-25 15:04
The basic idea of the patch is good, but instead of introducing _MAX_LENGTH, maxDiff should be reused.
msg139081 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2011-06-25 15:07
Sorry, ignore that. I see that the patch already passes maxDiff to truncate_str.
msg181551 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-06 18:16
I left comments on Rietveld. Tests needed.
msg182423 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-02-19 20:06
There are some tests in the issue11763-2.diff patch.
msg199830 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-10-14 00:57
Attached an updated patch that addresses Serhiy comments.  The tests in the previous patch have been committed already, so this might need new tests.

The first problem I mentioned in msg134530 has been reported in #19217.
History
Date User Action Args
2013-10-14 00:57:54ezio.melottisetfiles: + issue11763-3.diff

messages: + msg199830
2013-02-19 20:06:56ezio.melottisetmessages: + msg182423
2013-02-06 18:16:37serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg181551
versions: + Python 3.4
2011-06-25 15:07:33michael.foordsetmessages: + msg139081
2011-06-25 15:04:35michael.foordsetmessages: + msg139080
2011-06-25 13:33:56sara.magliacanesetfiles: + issue11763_safe_repr.diff
nosy: + sara.magliacane
messages: + msg139061

2011-04-27 07:30:50ezio.melottisetresolution: fixed
messages: + msg134530
2011-04-27 07:21:37python-devsetnosy: + python-dev
messages: + msg134529
2011-04-04 23:30:14ezio.melottisetfiles: + issue11763-2.diff

nosy: + rhettinger
assignee: michael.foord -> ezio.melotti
components: + Library (Lib)
type: behavior
stage: patch review
2011-04-04 23:01:26pitrousetmessages: + msg132987
2011-04-04 22:59:39ezio.melottisetfiles: + issue11763.diff
keywords: + patch
messages: + msg132986
2011-04-04 20:16:03michael.foordcreate