Issue 24138: Speed up range() by caching and modifying long objects

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/68326

classification

Title:	Speed up range() by caching and modifying long objects
Type:	performance	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.5

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:	larry	Nosy List:	BreamoreBoy, ethan.furman, gvanrossum, larry, lemburg, mark.dickinson, pitrou, rhettinger, scoder, serhiy.storchaka
Priority:	normal	Keywords:	patch

Created on 2015-05-06 16:18 by larry, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
larry.range.hack.1.txt	larry, 2015-05-06 16:18		review
int_free_list.patch	serhiy.storchaka, 2015-05-06 17:37		review

Messages (15)
msg242685 - (view)	Author: Larry Hastings (larry) *	Date: 2015-05-06 16:18
This probably shouldn't be checked in. But it was an interesting experiment, and I did get it to work. My brother forwarded me this question from Stack Overflow: http://stackoverflow.com/questions/23453133/is-there-a-reason-python-3-enumerates-slower-than-python-2 The debate brought up a good point: a lot of the overhead of range() is in creating and destroying the long objects. I wondered, could I get rid of that? Long story short, yes. rangeiterobject is a special-case range object for when you're iterating over integer values and the values all fit inside C longs. Otherwise it has to use the much slower general-purpose longrangeiterobject.) rangeiter_next is simple: it computes the new value then returns PyLong_FromLong of that value. First thought: cache a reference to the previous value. If its reference count is 1, we have the only reference. Overwrite its value and return it. But that doesn't help in the general case, because in "for x in range(1000)" x is holding a reference at the time __next__ is called on the iterator. The trick: cache two old yielded objects. In the general case, by the second iteration, everyone else has dropped their references to the older cached object and we can modify it safely. The second trick: if the value you're yielding is one of the interned "small ints", you have to return the interned small int. (Otherwise you break 0 == 0, I kid you not.) With this patch applied all regression tests pass. And, on my desktop machine, the benchmark they used in the above link: ./python -mtimeit -n 5 -r 2 -s"cnt = 0" "for i in range(10000000): cnt += 1" drops from 410ms to 318ms ("5 loops, best of 2: 318 msec per loop"). This implementation requires the rangeiterobject to have intimate knowledge of the implementation of the PyLongObject, including copy-and-pasting some information that isn't otherwise exposed (max small int essentially). At the very least that information would need to be exposed properly so rangeiterobject could use it correctly before this could be checked in. It might be cleaner for longobject.c to expose a private "set this long object to this value" function. It would fail if we can't do it safely: if the value was an interned (small) int, or if the long was of the wrong size (Py_SIZE(o) != 1). Is this interesting enough to pursue? I'm happy to drop it.
msg242686 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-05-06 16:38
This regression was just discussed in issue24076. General suggestion about free list for small ints was proposed. It could speed up other integer computations, but comprehensive benchmarking results are needed. See also similar issue23507 for tuples. Perhaps we need general solution for fast specialized free lists.
msg242687 - (view)	Author: Stefan Behnel (scoder) *	Date: 2015-05-06 16:38
See issue 24076.
msg242691 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-05-06 17:37
Here is a patch that adds a free list for 1-digit long objects. $ ./python -m timeit -s "r = range(103)" -- "for i in r: pass" Unpatched: 10000 loops, best of 3: 54.4 usec per loop With free list: 10000 loops, best of 3: 38 usec per loop With hacked range: 10000 loops, best of 3: 34.5 usec per loop Python 2.7: 10000 loops, best of 3: 37.1 usec per loop $ ./python -m timeit -s "r = list(range(103))" -- "for i in r: pass" 10000 loops, best of 3: 30.7 usec per loop In Python 2.7: $ ./python -m timeit -s "r = xrange(103)" -- "for i in r: pass" 10000 loops, best of 3: 41.4 usec per loop $ ./python -m timeit -s "r = range(103)" -- "for i in r: pass" 10000 loops, best of 3: 37.1 usec per loop
msg242704 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2015-05-06 19:55
How does this apply where people like me scarcely if ever use range, it's standard for loops? There are plenty of recipes using itertools that show that you don't need range, is this really needed? No axe to grind, just curious.
msg242707 - (view)	Author: Larry Hastings (larry) *	Date: 2015-05-06 20:47
103 doesn't show off this hack as much as other numbers would; the hack only operates from 257 to the max in that will fit in a single long "digit" (32767 on 32-bit, 230 on 64-bit). Anyway, freelist for one-digit longs seems nearly as good, and it's a lot more general-purpose, so I'm much more interested in that.
msg242885 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2015-05-11 08:57
I like the idea of adding a free list of longs in Python 3, but I think we should extend this somewhat to cover more ground, e.g. by pre-allocating a block of 1 digit long objects, like we did for Python 2 ints, and perhaps allocate up to 4k (= 1 memory page) towards such a free list. The added cache locality of having the data in a pre-allocated block should provide some more performance.
msg242886 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-05-11 09:42
According to my and Larry's measurements [1] the distribution of created int's by size during running Python tests is: On 32-bit Linux: int 42828741 13.40% 0 425353 0.99% 0.99% 1 21399290 49.96% 50.96% 2 10496856 24.51% 75.47% 3 4873346 11.38% 86.85% 4 1021563 2.39% 89.23% 5 1246444 2.91% 92.14% 6 733676 1.71% 93.85% 7 123074 0.29% 94.14% 8 139203 0.33% 94.47% ... On 64-bit Linux: int 47600237 13.53% 0 294964 0.62% 0.62% 1 36135772 75.92% 76.53% 2 4504046 9.46% 86.00% 3 2109837 4.43% 90.43% 4 1277995 2.68% 93.11% 5 542775 1.14% 94.25% 6 485451 1.02% 95.27% ... 86% of ints have size <= 3 on 32-bit and <= 2 on 64-bit. This is enough to represent 32-bit integers (as in Python 2 int). I think we should support free list not only for 1-digit ints, but at least up to 3 digit on 32-bit build and up to 2 digits on 64-bit build. Other natural limit is 3 digit on 64-bit build (enough to represent 64-bit C long on Linux or pointer on all platforms). Larger integers perhaps are encountered mainly in tests. Pre-allocating a block has a disadvantage. It is hard to free allocated block. The program can create a lot of integers, then drop most of them, and request the memory for other needs, but blocks once allocated for integers would not freed. This is not trivial design decision and should be discussed on Python-Dev and accepted by BDFL.
msg242887 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-05-11 09:43
[1] http://comments.gmane.org/gmane.comp.python.devel/153078
msg242888 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-05-11 10:09
I'd rather stick to the simple freelist approach. People interested in (allegedly) more ambitious designs can open new issues, together with patches to evaluate their efficiency.
msg242889 - (view)	Author: Larry Hastings (larry) *	Date: 2015-05-11 10:10
I'd rather have the general-purpose freelist for ints too. How about we close this issue now, and assuming the freelist for ints goes in we can abandon this approach entirely.
msg242891 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2015-05-11 11:05
On 11.05.2015 11:42, Serhiy Storchaka wrote: > > Pre-allocating a block has a disadvantage. It is hard to free allocated block. The program can create a lot of integers, then drop most of them, and request the memory for other needs, but blocks once allocated for integers would not freed. This is not trivial design decision and should be discussed on Python-Dev and accepted by BDFL. True, but if it's only 1-4k RAM, I don't think anyone would mind :-) Python 2 is doing exactly that with 1k RAM for the integer free list. I think it was one of the first free lists ever added to Python, so done in a time when RAM was expensive ;-).
msg242892 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-05-11 12:36
Well, 1-4k RAM can be consumed just after the start up (it's only 100-300 integers) and never freed. What next?
msg402137 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2021-09-18 17:44
Should we try the free list for integers again?
msg402138 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2021-09-18 17:55
(Never mind, that should have gone to https://bugs.python.org/issue24165, which is the integer freelist -- conclusion there in 2015 was also that it didn't do much.)

History
Date	User	Action	Args
2022-04-11 14:58:16	admin	set	github: 68326
2021-09-18 17:55:25	gvanrossum	set	messages: + msg402138
2021-09-18 17:44:32	gvanrossum	set	nosy: + gvanrossum messages: + msg402137
2015-05-11 12:36:52	serhiy.storchaka	set	messages: + msg242892
2015-05-11 11:05:13	lemburg	set	messages: + msg242891
2015-05-11 10:10:56	larry	set	status: open -> closed resolution: rejected messages: + msg242889 stage: patch review -> resolved
2015-05-11 10:09:17	pitrou	set	messages: + msg242888
2015-05-11 09:43:06	serhiy.storchaka	set	messages: + msg242887
2015-05-11 09:42:21	serhiy.storchaka	set	messages: + msg242886
2015-05-11 08:57:43	lemburg	set	nosy: + lemburg messages: + msg242885
2015-05-07 20:48:43	ethan.furman	set	nosy: + ethan.furman
2015-05-06 20:47:57	larry	set	messages: + msg242707
2015-05-06 19:55:59	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg242704
2015-05-06 17:37:42	serhiy.storchaka	set	files: + int_free_list.patch keywords: + patch
2015-05-06 17:37:23	serhiy.storchaka	set	messages: + msg242691
2015-05-06 16:38:52	scoder	set	nosy: + scoder messages: + msg242687
2015-05-06 16:38:08	serhiy.storchaka	set	messages: + msg242686
2015-05-06 16:28:20	mark.dickinson	set	nosy: + mark.dickinson
2015-05-06 16:18:56	larry	create