classification
Title: repr() of large array objects takes quadratic time
Type: Stage:
Components: Library (Lib) Versions: Python 2.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: jackjansen, jneb, logistix, rhettinger, tim.peters
Priority: normal Keywords:

Created on 2003-02-05 11:00 by jneb, last changed 2003-04-29 21:51 by jackjansen. This issue is now closed.

Messages (10)
msg14447 - (view) Author: Jurjen N.E. Bos (jneb) * Date: 2003-02-05 11:00
This is a bug and a partial patch.

If I debug a program that contains a ridiculously large array (8M entries in my case), the debugger takes forever.
It happens in Mac OS X, Python 2.2, but I found the bug in is the repr module, so it is probably universal.
The thing is, that after the fix below, it still doesn't work!
Did I miss something trivial (like repr is builtin, or something like that?). Would someone with Mac OS X experience help out here, please (Jack?).

Here's the diff to make repr.repr work with large arrays:
13a14
>       self.maxarray = 5
50a52,62
>     def repr_array(self, x, level):
>         n = len(x)
>       header = "array('"+x.typecode+"', ["
>         if n == 0: return header+"])"
>         if level <= 0: return header+"...])"
>         s = ''
>         for i in range(min(n, self.maxarray)):
>             if s: s = s + ', '
>             s = s + self.repr1(x[i], level-1)
>         if n > self.maxarray: s = s + ', ...'
>         return header + s + "])"
msg14448 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2003-02-05 18:36
Logged In: YES 
user_id=31435

Nice to see you, Jurgen!  I checked this into current CVS, 
and it works fine for me in isolation:

>>> len(a)
11055060
>>> repr.repr(a)
"array('i', [0, 1, 2, 3, 4, ...])"
>>>

That goes in an eyeblink.  So more detail is needed about 
what "it still doesn't work!" means.  Assigned to Jack, and he 
can use current CVS to try it.

Lib/repr.py; new revision: 1.15
Lib/test/test_repr.py; new revision: 1.16
Misc/NEWS; new revision: 1.642
msg14449 - (view) Author: Jack Jansen (jackjansen) * (Python committer) Date: 2003-02-05 22:40
Logged In: YES 
user_id=45365

The fix is fine (it works for me the same way as for Tim), but I think we're shooting past the problem here.

First, pdb doesn't use repr.repr(), it uses the normal builtin repr().

Second, I don't see any sluggishness in pdb with large arrays. I tried
debugging
def foo():
    a = range(8000000)
and there was no problem. Allocating the object took a bit of time, yes, and if you actually try to print it you'll stare at about 800K lines filled with digits scrolling over your screen, but that is to be expected.

Could it be your sluggishness is coming from something else? For example, MacOSX starts behaving *very* badly if your root disk is full, because then it can't allocate swap space, and due to its optimistic behaviour it comes to a grinding halt.
msg14450 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2003-02-05 23:08
Logged In: YES 
user_id=31435

pdb does import repr.py, but probably doesn't use it in 
whatever way Jurjen is using to display his big array.

WRT that, note that Jurjen is using array.array objects, not 
lists.  The internal array.array tp_repr slot is quadratic-time in 
the size of the array, while list's tp_repr is linear time.
msg14451 - (view) Author: Jack Jansen (jackjansen) * (Python committer) Date: 2003-02-06 21:37
Logged In: YES 
user_id=45365

Okay, so the real bug is that tp_repr of array objects takes quadratic time.

I'm changing the summary of this report then, and assigning back to you (Tim), on the basis that you did more checkins on arraymodule than I did. Feel free to pass the potato on:-)
msg14452 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2003-02-07 01:40
Logged In: YES 
user_id=31435

I can't make time for this, so unassigned it.  It would make 
a good, brief project for someone -- the list and dict 
tp_reprs are linear-time, and tp_repr for array.array 
objects shouldn't be any harder than they were.
msg14453 - (view) Author: Grant Olson (logistix) Date: 2003-02-12 01:43
Logged In: YES 
user_id=699438

arraymodule's repr used "string += ',' + el" for each element in 
the array.  Lists and dicts did a string.join to build the repr.

Attached patch builds a tuple of array elements and then 
joins them.  (actually for some reason I can't attach now, I'll 
post the patch in patch manager)

This fixes the time issue, but I don't know enough about how 
you guys manage memory in each case to tell what impact 
that'll have on really, really big arrays (although I imagine it 
takes more memory).

msg14454 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2003-04-23 17:33
Logged In: YES 
user_id=80475

Fixed this up by converting the array to a list and then 
using the list object's efficient repr().

See Modules/arraymodule.c 2.87.

Since I categorize this as a performance issue and not a 
bug, I've applied the fix to Py2.3 but am not 
recommending for backport.
msg14455 - (view) Author: Jurjen N.E. Bos (jneb) * Date: 2003-04-25 09:32
Logged In: YES 
user_id=446428

The debugger I use, is not pdb, but the Mac only IDE debugger. I thought 
this was only a front end on pdb, but it apparently is not.
It seems that it is still slow in 2.3. (I can't check it at the moment, I am 
running a multiple hour computation...)
May it will automatically be fixed if Jack manages to get IDLE working on the 
Mac...
msg14456 - (view) Author: Jack Jansen (jackjansen) * (Python committer) Date: 2003-04-29 21:51
Logged In: YES 
user_id=45365

Jurjen,
could you submit a separate bug report for the MacPython IDE? It needs a 
different solution (and I'm not going to come around to it soon, so otherwise 
I may forget).
History
Date User Action Args
2003-02-05 11:00:07jnebcreate