classification
Title: Building a list of tuples has non-linear performance
Type: performance Stage: patch review
Components: Interpreter Core Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Rhamphoryncus, collinwinter, gregory.p.smith, jcea, loewis, pitrou
Priority: normal Keywords: patch

Created on 2008-10-08 04:00 by gregory.p.smith, last changed 2009-01-09 23:10 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
tuple_gc_hell.py pitrou, 2008-12-17 21:24
gctrigger5.patch pitrou, 2008-12-17 23:36
gctrigger6.patch pitrou, 2009-01-09 21:08
Messages (18)
msg74512 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-10-08 04:00
The attached script simply loops building a list of tuples.  It has
horrible performance as the list gets larger compared to something
appending simple objects like ints to the list.

% python tuple_gc_hell.py                  ~
...
1000000 1.3615500927
2000000 2.95893096924
3000000 4.53531980515
4000000 5.62795209885
5000000 7.70974612236
...

The time it takes to execute grows linearly with the number of tuples
already appended to the list.

Turning on gc.debug(gc.DEBUG_STATS) shows why as does running python
under a profiler:

  %   cumulative   self              self     total          
 time   seconds   seconds    calls   s/call   s/call  name   
 27.85    115.84   115.84    14270     0.01     0.02  collect
 21.19    204.01    88.17 1115120314   0.00     0.00  tupletraverse
 13.14    258.65    54.64 1655624731   0.00     0.00  visit_reachable
 10.72    303.25    44.60 1655624731   0.00     0.00  visit_decref
  5.97    328.10    24.85      338     0.07     1.10  PyEval_EvalFrame


It appears the cyclic gc is rechecking all of these tuples repeatedly. 
disabling gc during the loop or using a very high gc.set_threshold hides
the problem.
msg74513 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-10-08 05:09
This is a known problem; see the GC discussions in June for an example, e.g.

http://mail.python.org/pipermail/python-dev/2008-June/080579.html
msg74598 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-10-09 20:19
Someone should try implementing Martin's suggestion one day :)
msg77802 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-12-14 16:11
Here is a simple patch implementing Martin's proposal with a few basic
tweaks.

Using Greg's script, we get:

-> without patch:

1000000 2.64134001732
2000000 3.60712885857
3000000 5.40855813026
4000000 6.46308898926
5000000 8.65147781372
6000000 10.3949871063
7000000 12.1376650333
8000000 12.7046239376
9000000 15.4652290344

-> with patch:

1000000 2.692289114
2000000 1.91455483437
3000000 2.02900099754
4000000 1.72369813919
5000000 1.87697696686
6000000 2.08952093124
7000000 1.08168196678
8000000 2.32068109512
9000000 1.1202070713

-> with GC disabled:

1000000 1.6810901165
2000000 0.955595970154
3000000 0.959649085999
4000000 0.933673858643
5000000 0.954123973846
6000000 0.929254055023
7000000 0.901160001755
8000000 0.921751022339
9000000 0.894830942154
msg77812 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-12-14 18:07
I didn't test it, but the patch looks okay to me.
msg77966 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-12-17 15:00
This new patch adds another improvement where tuples can be "optimized".
Optimized means that tuples which don't contain any GC-tracked object
become themselves untracked. Since tuples are immutable this
optimization is valid, and since it is common to store lots of tuples as
very simple containers of atomic objects this can be an interesting
optimization.
msg77967 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-12-17 15:21
This new patch also adds a function named gc.is_tracked() which returns
True if the object is tracked by the GC:

>>> import gc
>>> gc.is_tracked(1)
False
>>> gc.is_tracked([])
True
>>> gc.is_tracked(())
True
>>> gc.is_tracked((0,1))
False
>>> gc.is_tracked((0,"a"))
False
>>> gc.is_tracked((0,[]))
True
>>> gc.is_tracked((0,(1,2)))
False
>>> gc.is_tracked((0,(1,0.55)))
False
>>> gc.is_tracked((0,(1,{})))
True
>>> gc.is_tracked((None, True, False, "a", (1,2,u"z",("another",
"nested", u"tuple")), int))
False
>>> gc.is_tracked(gc)
True


(as you see the empty tuple is considered tracked, this is not important
since it is a singleton anyway)
msg77974 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-12-17 17:18
Here is a new patch adding tests for gc.is_tracked().
msg77975 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-12-17 17:20
FWIW, with the tuple optimization, the number of objects in
gc.get_objects() after the regression tests has fallen from 150000 to
140000.
msg77985 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-12-17 21:24
New version of Greg's script, with different choices (list of tuples,
list of lists, list of dicts, dict of tuples).
msg77987 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-12-17 21:36
Now an additional patch which applies over the basic gctrigger4.patch.
It adds dict optimizations so that dicts with only atomic or immutable
elements are also untracked (they get tracked later when other trackable
elements are added).

Since dicts are often used as mappings of simple objects (int, str,
tuple) over other simple objects, this can be very useful.
msg77994 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-12-17 23:36
Cleanup: this patch only has the algorithmic change. Tuple and dict opts
will go to a separate tracker issue.
msg77997 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-12-17 23:49
The optimization issue for tuples and dicts is #4688.
msg79445 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2009-01-08 21:52
Looking at gctrigger5.patch, my only thought is that it would be 
helpful for posterity to include more verbose comments about why this 
new behaviour was chosen; chunks of Martin's proposal from the 
referenced python-dev thread could be included verbatim.

Otherwise, LGTM.
msg79506 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-09 21:08
Patch with updated comments.
msg79510 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2009-01-09 21:26
LGTM. Go ahead and commit this.
msg79520 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-09 22:28
Ok, committed in trunk and py3k. Thanks!
msg79523 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-09 23:10
> Ok, committed in trunk and py3k. Thanks!

Thanks!
History
Date User Action Args
2009-01-09 23:10:37loewissetmessages: + msg79523
2009-01-09 22:28:31pitrousetstatus: open -> closed
resolution: fixed
messages: + msg79520
2009-01-09 21:26:58collinwintersetmessages: + msg79510
2009-01-09 21:08:25pitrousetfiles: + gctrigger6.patch
messages: + msg79506
2009-01-08 21:52:12collinwintersetmessages: + msg79445
2008-12-30 22:22:00jceasetnosy: + jcea
2008-12-17 23:49:27pitrousetmessages: + msg77997
2008-12-17 23:36:56pitrousetfiles: + gctrigger5.patch
messages: + msg77994
2008-12-17 23:36:07pitrousetfiles: - gctrigger4.patch
2008-12-17 23:36:04pitrousetfiles: - gctrigger3.patch
2008-12-17 23:36:01pitrousetfiles: - gctrigger2.patch
2008-12-17 23:35:58pitrousetfiles: - gctrigger.patch
2008-12-17 23:35:54pitrousetfiles: - dictopts.patch
2008-12-17 21:36:42pitrousetfiles: + dictopts.patch
messages: + msg77987
2008-12-17 21:24:13pitrousetfiles: + tuple_gc_hell.py
messages: + msg77985
2008-12-17 21:23:38pitrousetfiles: - tuple_gc_hell.py
2008-12-17 17:20:21pitrousetmessages: + msg77975
2008-12-17 17:18:53pitrousetfiles: + gctrigger4.patch
messages: + msg77974
2008-12-17 15:21:13pitrousetfiles: + gctrigger3.patch
messages: + msg77967
2008-12-17 15:00:34pitrousetfiles: + gctrigger2.patch
messages: + msg77966
2008-12-17 03:25:51collinwintersetnosy: + collinwinter
2008-12-14 18:07:19Rhamphoryncussetnosy: + Rhamphoryncus
messages: + msg77812
2008-12-14 16:11:53pitrousetfiles: + gctrigger.patch
keywords: + patch
messages: + msg77802
stage: patch review
2008-10-09 20:19:40pitrousetnosy: + pitrou
messages: + msg74598
versions: + Python 3.1, - Python 2.6, Python 2.5
2008-10-08 05:09:33loewissetnosy: + loewis
messages: + msg74513
2008-10-08 04:00:51gregory.p.smithcreate