Building a list of tuples has non-linear performance #48324

gpshead · 2008-10-08T04:00:52Z

BPO	4074
Nosy	@loewis, @gpshead, @jcea, @pitrou
Files	tuple_gc_hell.py gctrigger5.patch gctrigger6.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2009-01-09.22:28:31.047>
created_at = <Date 2008-10-08.04:00:51.550>
labels = ['interpreter-core', 'performance']
title = 'Building a list of tuples has non-linear performance'
updated_at = <Date 2009-01-09.23:10:37.233>
user = 'https://github.com/gpshead'

bugs.python.org fields:

activity = <Date 2009-01-09.23:10:37.233>
actor = 'loewis'
assignee = 'none'
closed = True
closed_date = <Date 2009-01-09.22:28:31.047>
closer = 'pitrou'
components = ['Interpreter Core']
creation = <Date 2008-10-08.04:00:51.550>
creator = 'gregory.p.smith'
dependencies = []
files = ['12382', '12386', '12670']
hgrepos = []
issue_num = 4074
keywords = ['patch']
message_count = 18.0
messages = ['74512', '74513', '74598', '77802', '77812', '77966', '77967', '77974', '77975', '77985', '77987', '77994', '77997', '79445', '79506', '79510', '79520', '79523']
nosy_count = 6.0
nosy_names = ['loewis', 'collinwinter', 'gregory.p.smith', 'jcea', 'Rhamphoryncus', 'pitrou']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'patch review'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue4074'
versions = ['Python 3.1', 'Python 2.7']

gpshead · 2008-10-08T04:00:49Z

The attached script simply loops building a list of tuples. It has
horrible performance as the list gets larger compared to something
appending simple objects like ints to the list.

% python tuple_gc_hell.py ~
...
1000000 1.3615500927
2000000 2.95893096924
3000000 4.53531980515
4000000 5.62795209885
5000000 7.70974612236
...

The time it takes to execute grows linearly with the number of tuples
already appended to the list.

Turning on gc.debug(gc.DEBUG_STATS) shows why as does running python
under a profiler:

% cumulative self self total
time seconds seconds calls s/call s/call name
27.85 115.84 115.84 14270 0.01 0.02 collect
21.19 204.01 88.17 1115120314 0.00 0.00 tupletraverse
13.14 258.65 54.64 1655624731 0.00 0.00 visit_reachable
10.72 303.25 44.60 1655624731 0.00 0.00 visit_decref
5.97 328.10 24.85 338 0.07 1.10 PyEval_EvalFrame

It appears the cyclic gc is rechecking all of these tuples repeatedly.
disabling gc during the loop or using a very high gc.set_threshold hides
the problem.

loewis · 2008-10-08T05:09:32Z

This is a known problem; see the GC discussions in June for an example, e.g.

http://mail.python.org/pipermail/python-dev/2008-June/080579.html

pitrou · 2008-10-09T20:19:40Z

Someone should try implementing Martin's suggestion one day :)

pitrou · 2008-12-14T16:11:52Z

Here is a simple patch implementing Martin's proposal with a few basic
tweaks.

Using Greg's script, we get:

-> without patch:

1000000 2.64134001732
2000000 3.60712885857
3000000 5.40855813026
4000000 6.46308898926
5000000 8.65147781372
6000000 10.3949871063
7000000 12.1376650333
8000000 12.7046239376
9000000 15.4652290344

-> with patch:

1000000 2.692289114
2000000 1.91455483437
3000000 2.02900099754
4000000 1.72369813919
5000000 1.87697696686
6000000 2.08952093124
7000000 1.08168196678
8000000 2.32068109512
9000000 1.1202070713

-> with GC disabled:

1000000 1.6810901165
2000000 0.955595970154
3000000 0.959649085999
4000000 0.933673858643
5000000 0.954123973846
6000000 0.929254055023
7000000 0.901160001755
8000000 0.921751022339
9000000 0.894830942154

Rhamphoryncus · 2008-12-14T18:07:19Z

I didn't test it, but the patch looks okay to me.

pitrou · 2008-12-17T15:00:30Z

This new patch adds another improvement where tuples can be "optimized".
Optimized means that tuples which don't contain any GC-tracked object
become themselves untracked. Since tuples are immutable this
optimization is valid, and since it is common to store lots of tuples as
very simple containers of atomic objects this can be an interesting
optimization.

pitrou · 2008-12-17T15:21:12Z

This new patch also adds a function named gc.is_tracked() which returns
True if the object is tracked by the GC:

>>> import gc
>>> gc.is_tracked(1)
False
>>> gc.is_tracked([])
True
>>> gc.is_tracked(())
True
>>> gc.is_tracked((0,1))
False
>>> gc.is_tracked((0,"a"))
False
>>> gc.is_tracked((0,[]))
True
>>> gc.is_tracked((0,(1,2)))
False
>>> gc.is_tracked((0,(1,0.55)))
False
>>> gc.is_tracked((0,(1,{})))
True
>>> gc.is_tracked((None, True, False, "a", (1,2,u"z",("another",
"nested", u"tuple")), int))
False
>>> gc.is_tracked(gc)
True

(as you see the empty tuple is considered tracked, this is not important
since it is a singleton anyway)

pitrou · 2008-12-17T17:18:52Z

Here is a new patch adding tests for gc.is_tracked().

pitrou · 2008-12-17T17:20:21Z

FWIW, with the tuple optimization, the number of objects in
gc.get_objects() after the regression tests has fallen from 150000 to
140000.

pitrou · 2008-12-17T21:24:13Z

New version of Greg's script, with different choices (list of tuples,
list of lists, list of dicts, dict of tuples).

pitrou · 2008-12-17T21:36:38Z

Now an additional patch which applies over the basic gctrigger4.patch.
It adds dict optimizations so that dicts with only atomic or immutable
elements are also untracked (they get tracked later when other trackable
elements are added).

Since dicts are often used as mappings of simple objects (int, str,
tuple) over other simple objects, this can be very useful.

pitrou · 2008-12-17T23:36:55Z

Cleanup: this patch only has the algorithmic change. Tuple and dict opts
will go to a separate tracker issue.

pitrou · 2008-12-17T23:49:27Z

The optimization issue for tuples and dicts is bpo-4688.

collinwinter · 2009-01-08T21:52:11Z

Looking at gctrigger5.patch, my only thought is that it would be
helpful for posterity to include more verbose comments about why this
new behaviour was chosen; chunks of Martin's proposal from the
referenced python-dev thread could be included verbatim.

Otherwise, LGTM.

pitrou · 2009-01-09T21:08:23Z

Patch with updated comments.

collinwinter · 2009-01-09T21:26:58Z

LGTM. Go ahead and commit this.

pitrou · 2009-01-09T22:28:31Z

Ok, committed in trunk and py3k. Thanks!

loewis · 2009-01-09T23:10:37Z

Ok, committed in trunk and py3k. Thanks!

Thanks!

gpshead added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Oct 8, 2008

pitrou closed this as completed Jan 9, 2009

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building a list of tuples has non-linear performance #48324

Building a list of tuples has non-linear performance #48324

gpshead commented Oct 8, 2008

gpshead commented Oct 8, 2008

loewis mannequin commented Oct 8, 2008

pitrou commented Oct 9, 2008

pitrou commented Dec 14, 2008

Rhamphoryncus mannequin commented Dec 14, 2008

pitrou commented Dec 17, 2008

pitrou commented Dec 17, 2008

pitrou commented Dec 17, 2008

pitrou commented Dec 17, 2008

pitrou commented Dec 17, 2008

pitrou commented Dec 17, 2008

pitrou commented Dec 17, 2008

pitrou commented Dec 17, 2008

collinwinter mannequin commented Jan 8, 2009

pitrou commented Jan 9, 2009

collinwinter mannequin commented Jan 9, 2009

pitrou commented Jan 9, 2009

loewis mannequin commented Jan 9, 2009

Building a list of tuples has non-linear performance #48324

Building a list of tuples has non-linear performance #48324

Comments

gpshead commented Oct 8, 2008

gpshead commented Oct 8, 2008

loewis mannequin commented Oct 8, 2008

pitrou commented Oct 9, 2008

pitrou commented Dec 14, 2008

Rhamphoryncus mannequin commented Dec 14, 2008

pitrou commented Dec 17, 2008

pitrou commented Dec 17, 2008

pitrou commented Dec 17, 2008

pitrou commented Dec 17, 2008

pitrou commented Dec 17, 2008

pitrou commented Dec 17, 2008

pitrou commented Dec 17, 2008

pitrou commented Dec 17, 2008

collinwinter mannequin commented Jan 8, 2009

pitrou commented Jan 9, 2009

collinwinter mannequin commented Jan 9, 2009

pitrou commented Jan 9, 2009

loewis mannequin commented Jan 9, 2009