This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gregory.p.smith
Recipients gregory.p.smith
Date 2008-10-08.04:00:48
SpamBayes Score 4.5463633e-13
Marked as misclassified No
Message-id <1223438452.4.0.926419942993.issue4074@psf.upfronthosting.co.za>
In-reply-to
Content
The attached script simply loops building a list of tuples.  It has
horrible performance as the list gets larger compared to something
appending simple objects like ints to the list.

% python tuple_gc_hell.py                  ~
...
1000000 1.3615500927
2000000 2.95893096924
3000000 4.53531980515
4000000 5.62795209885
5000000 7.70974612236
...

The time it takes to execute grows linearly with the number of tuples
already appended to the list.

Turning on gc.debug(gc.DEBUG_STATS) shows why as does running python
under a profiler:

  %   cumulative   self              self     total          
 time   seconds   seconds    calls   s/call   s/call  name   
 27.85    115.84   115.84    14270     0.01     0.02  collect
 21.19    204.01    88.17 1115120314   0.00     0.00  tupletraverse
 13.14    258.65    54.64 1655624731   0.00     0.00  visit_reachable
 10.72    303.25    44.60 1655624731   0.00     0.00  visit_decref
  5.97    328.10    24.85      338     0.07     1.10  PyEval_EvalFrame


It appears the cyclic gc is rechecking all of these tuples repeatedly. 
disabling gc during the loop or using a very high gc.set_threshold hides
the problem.
History
Date User Action Args
2008-10-08 04:00:52gregory.p.smithsetrecipients: + gregory.p.smith
2008-10-08 04:00:52gregory.p.smithsetmessageid: <1223438452.4.0.926419942993.issue4074@psf.upfronthosting.co.za>
2008-10-08 04:00:51gregory.p.smithlinkissue4074 messages
2008-10-08 04:00:49gregory.p.smithcreate