Issue4074
Created on 2008-10-08 04:00 by gregory.p.smith, last changed 2009-01-09 23:10 by loewis.
|
msg74512 - (view) |
Author: Gregory P. Smith (gregory.p.smith) |
Date: 2008-10-08 04:00 |
|
The attached script simply loops building a list of tuples. It has
horrible performance as the list gets larger compared to something
appending simple objects like ints to the list.
% python tuple_gc_hell.py ~
...
1000000 1.3615500927
2000000 2.95893096924
3000000 4.53531980515
4000000 5.62795209885
5000000 7.70974612236
...
The time it takes to execute grows linearly with the number of tuples
already appended to the list.
Turning on gc.debug(gc.DEBUG_STATS) shows why as does running python
under a profiler:
% cumulative self self total
time seconds seconds calls s/call s/call name
27.85 115.84 115.84 14270 0.01 0.02 collect
21.19 204.01 88.17 1115120314 0.00 0.00 tupletraverse
13.14 258.65 54.64 1655624731 0.00 0.00 visit_reachable
10.72 303.25 44.60 1655624731 0.00 0.00 visit_decref
5.97 328.10 24.85 338 0.07 1.10 PyEval_EvalFrame
It appears the cyclic gc is rechecking all of these tuples repeatedly.
disabling gc during the loop or using a very high gc.set_threshold hides
the problem.
|
|
msg74513 - (view) |
Author: Martin v. Löwis (loewis) |
Date: 2008-10-08 05:09 |
|
This is a known problem; see the GC discussions in June for an example, e.g.
http://mail.python.org/pipermail/python-dev/2008-June/080579.html
|
|
msg74598 - (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-10-09 20:19 |
|
Someone should try implementing Martin's suggestion one day :)
|
|
msg77802 - (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-12-14 16:11 |
|
Here is a simple patch implementing Martin's proposal with a few basic
tweaks.
Using Greg's script, we get:
-> without patch:
1000000 2.64134001732
2000000 3.60712885857
3000000 5.40855813026
4000000 6.46308898926
5000000 8.65147781372
6000000 10.3949871063
7000000 12.1376650333
8000000 12.7046239376
9000000 15.4652290344
-> with patch:
1000000 2.692289114
2000000 1.91455483437
3000000 2.02900099754
4000000 1.72369813919
5000000 1.87697696686
6000000 2.08952093124
7000000 1.08168196678
8000000 2.32068109512
9000000 1.1202070713
-> with GC disabled:
1000000 1.6810901165
2000000 0.955595970154
3000000 0.959649085999
4000000 0.933673858643
5000000 0.954123973846
6000000 0.929254055023
7000000 0.901160001755
8000000 0.921751022339
9000000 0.894830942154
|
|
msg77812 - (view) |
Author: Adam Olsen (Rhamphoryncus) |
Date: 2008-12-14 18:07 |
|
I didn't test it, but the patch looks okay to me.
|
|
msg77966 - (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-12-17 15:00 |
|
This new patch adds another improvement where tuples can be "optimized".
Optimized means that tuples which don't contain any GC-tracked object
become themselves untracked. Since tuples are immutable this
optimization is valid, and since it is common to store lots of tuples as
very simple containers of atomic objects this can be an interesting
optimization.
|
|
msg77967 - (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-12-17 15:21 |
|
This new patch also adds a function named gc.is_tracked() which returns
True if the object is tracked by the GC:
>>> import gc
>>> gc.is_tracked(1)
False
>>> gc.is_tracked([])
True
>>> gc.is_tracked(())
True
>>> gc.is_tracked((0,1))
False
>>> gc.is_tracked((0,"a"))
False
>>> gc.is_tracked((0,[]))
True
>>> gc.is_tracked((0,(1,2)))
False
>>> gc.is_tracked((0,(1,0.55)))
False
>>> gc.is_tracked((0,(1,{})))
True
>>> gc.is_tracked((None, True, False, "a", (1,2,u"z",("another",
"nested", u"tuple")), int))
False
>>> gc.is_tracked(gc)
True
(as you see the empty tuple is considered tracked, this is not important
since it is a singleton anyway)
|
|
msg77974 - (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-12-17 17:18 |
|
Here is a new patch adding tests for gc.is_tracked().
|
|
msg77975 - (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-12-17 17:20 |
|
FWIW, with the tuple optimization, the number of objects in
gc.get_objects() after the regression tests has fallen from 150000 to
140000.
|
|
msg77985 - (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-12-17 21:24 |
|
New version of Greg's script, with different choices (list of tuples,
list of lists, list of dicts, dict of tuples).
|
|
msg77987 - (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-12-17 21:36 |
|
Now an additional patch which applies over the basic gctrigger4.patch.
It adds dict optimizations so that dicts with only atomic or immutable
elements are also untracked (they get tracked later when other trackable
elements are added).
Since dicts are often used as mappings of simple objects (int, str,
tuple) over other simple objects, this can be very useful.
|
|
msg77994 - (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-12-17 23:36 |
|
Cleanup: this patch only has the algorithmic change. Tuple and dict opts
will go to a separate tracker issue.
|
|
msg77997 - (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2008-12-17 23:49 |
|
The optimization issue for tuples and dicts is #4688.
|
|
msg79445 - (view) |
Author: Collin Winter (collinwinter) |
Date: 2009-01-08 21:52 |
|
Looking at gctrigger5.patch, my only thought is that it would be
helpful for posterity to include more verbose comments about why this
new behaviour was chosen; chunks of Martin's proposal from the
referenced python-dev thread could be included verbatim.
Otherwise, LGTM.
|
|
msg79506 - (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2009-01-09 21:08 |
|
Patch with updated comments.
|
|
msg79510 - (view) |
Author: Collin Winter (collinwinter) |
Date: 2009-01-09 21:26 |
|
LGTM. Go ahead and commit this.
|
|
msg79520 - (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2009-01-09 22:28 |
|
Ok, committed in trunk and py3k. Thanks!
|
|
msg79523 - (view) |
Author: Martin v. Löwis (loewis) |
Date: 2009-01-09 23:10 |
|
> Ok, committed in trunk and py3k. Thanks!
Thanks!
|
|
| Date |
User |
Action |
Args |
| 2009-01-09 23:10:37 | loewis | set | messages:
+ msg79523 |
| 2009-01-09 22:28:31 | pitrou | set | status: open -> closed resolution: fixed messages:
+ msg79520 |
| 2009-01-09 21:26:58 | collinwinter | set | messages:
+ msg79510 |
| 2009-01-09 21:08:25 | pitrou | set | files:
+ gctrigger6.patch messages:
+ msg79506 |
| 2009-01-08 21:52:12 | collinwinter | set | messages:
+ msg79445 |
| 2008-12-30 22:22:00 | jcea | set | nosy:
+ jcea |
| 2008-12-17 23:49:27 | pitrou | set | messages:
+ msg77997 |
| 2008-12-17 23:36:56 | pitrou | set | files:
+ gctrigger5.patch messages:
+ msg77994 |
| 2008-12-17 23:36:07 | pitrou | set | files:
- gctrigger4.patch |
| 2008-12-17 23:36:04 | pitrou | set | files:
- gctrigger3.patch |
| 2008-12-17 23:36:01 | pitrou | set | files:
- gctrigger2.patch |
| 2008-12-17 23:35:58 | pitrou | set | files:
- gctrigger.patch |
| 2008-12-17 23:35:54 | pitrou | set | files:
- dictopts.patch |
| 2008-12-17 21:36:42 | pitrou | set | files:
+ dictopts.patch messages:
+ msg77987 |
| 2008-12-17 21:24:13 | pitrou | set | files:
+ tuple_gc_hell.py messages:
+ msg77985 |
| 2008-12-17 21:23:38 | pitrou | set | files:
- tuple_gc_hell.py |
| 2008-12-17 17:20:21 | pitrou | set | messages:
+ msg77975 |
| 2008-12-17 17:18:53 | pitrou | set | files:
+ gctrigger4.patch messages:
+ msg77974 |
| 2008-12-17 15:21:13 | pitrou | set | files:
+ gctrigger3.patch messages:
+ msg77967 |
| 2008-12-17 15:00:34 | pitrou | set | files:
+ gctrigger2.patch messages:
+ msg77966 |
| 2008-12-17 03:25:51 | collinwinter | set | nosy:
+ collinwinter |
| 2008-12-14 18:07:19 | Rhamphoryncus | set | nosy:
+ Rhamphoryncus messages:
+ msg77812 |
| 2008-12-14 16:11:53 | pitrou | set | files:
+ gctrigger.patch keywords:
+ patch messages:
+ msg77802 stage: patch review |
| 2008-10-09 20:19:40 | pitrou | set | nosy:
+ pitrou messages:
+ msg74598 versions:
+ Python 3.1, - Python 2.6, Python 2.5 |
| 2008-10-08 05:09:33 | loewis | set | nosy:
+ loewis messages:
+ msg74513 |
| 2008-10-08 04:00:51 | gregory.p.smith | create | |
|