New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long object free list optimization #46297
Comments
The patch adds a free list of long objects with size 1 or -1. It has $ ./python -m timeit "for i in range(100): list(range(1000))" Without patch: With patch: |
The updated patch limits the free list. |
I don't get the same impressive speedup as you do, although it's still $ ./python -m timeit "for i in range(100): list(range(1000))"
Without patch:
100 loops, best of 3: 5.05 msec per loop
With patch:
100 loops, best of 3: 3.57 msec per loop Also, your patch is leaky. I'm attaching a fixed version (for py3k, |
Christian, maybe you did your measurements with the DEBUG flag turned Also, I'm not sure the patch is useful for 2.x since long objects with |
Antoine Pitrou wrote:
Yes, I've used a Py_Debug build to measure the speed difference. You are right. The patch makes no sense for the 2.x series. Christian |
Here are some updated timings against the current py3k branch: $ ./python -m timeit "sum(range(10000))" Without patch: $ ./python -m timeit -s "n=1000000" "sum(range(n, n+10000))" Without patch: $ ./python -m timeit "min(range(255))" Without patch: As you can see the patch makes things quite a bit better for 1-digit |
I'm attaching a slightly improved version of the longfreelist2 patch. ........................ --- issue2013-without-patch.txt 2008-03-22 12:18:37.000000000 -0700
+++ issue2013-longfreelist2-gps01-size-first.txt 2008-03-22
12:34:44.000000000 -0700
@@ -1,43 +1,43 @@
-3.0a3+ (py3k:61746M, Mar 22 2008, 12:12:29)
+3.0a3+ (py3k:61746M, Mar 22 2008, 12:33:47)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)]
./python.exe -m timeit min(range(255))
-100000 loops, best of 3: 17.2 usec per loop
+100000 loops, best of 3: 15.8 usec per loop
./python.exe -m timeit -s n=1000000 sum(range(n,n+10000))
-1000 loops, best of 3: 1.21 msec per loop
+1000 loops, best of 3: 1.22 msec per loop
./python.exe -m timeit sum(range(-32768,32768))
-100 loops, best of 3: 3.98 msec per loop
+100 loops, best of 3: 2.63 msec per loop
./python.exe -m timeit sum(range(-1000000,1000000))
-10 loops, best of 3: 296 msec per loop
+10 loops, best of 3: 273 msec per loop
./python.exe -m timeit sum(range(4000))
-1000 loops, best of 3: 239 usec per loop
+10000 loops, best of 3: 151 usec per loop
./python.exe -m timeit sum(list(range(4000)))
-1000 loops, best of 3: 360 usec per loop
+1000 loops, best of 3: 274 usec per loop
./python.exe -m timeit sum(range(10000))
-1000 loops, best of 3: 615 usec per loop
+1000 loops, best of 3: 382 usec per loop
./python.exe -m timeit sum(list(range(10000)))
-1000 loops, best of 3: 887 usec per loop
+1000 loops, best of 3: 802 usec per loop
./python.exe -m timeit sum(range(40000))
-100 loops, best of 3: 2.55 msec per loop
+100 loops, best of 3: 1.79 msec per loop
./python.exe -m timeit sum(range(80000))
-100 loops, best of 3: 6.4 msec per loop
+100 loops, best of 3: 5.77 msec per loop
........................ --- issue2013-without-patch.txt 2008-03-22 12:18:37.000000000 -0700
+++ issue2013-longfreelist2.txt 2008-03-22 12:19:46.000000000 -0700
@@ -1,43 +1,43 @@
-3.0a3+ (py3k:61746M, Mar 22 2008, 12:12:29)
+3.0a3+ (py3k:61746M, Mar 22 2008, 12:18:57)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)]
./python.exe -m timeit min(range(255))
-100000 loops, best of 3: 17.2 usec per loop
+100000 loops, best of 3: 16.1 usec per loop
./python.exe -m timeit -s n=1000000 sum(range(n,n+10000))
-1000 loops, best of 3: 1.21 msec per loop
+1000 loops, best of 3: 1.25 msec per loop
./python.exe -m timeit sum(range(-32768,32768))
-100 loops, best of 3: 3.98 msec per loop
+100 loops, best of 3: 2.95 msec per loop
./python.exe -m timeit sum(range(-1000000,1000000))
-10 loops, best of 3: 296 msec per loop
+10 loops, best of 3: 283 msec per loop
./python.exe -m timeit sum(range(4000))
-1000 loops, best of 3: 239 usec per loop
+1000 loops, best of 3: 177 usec per loop
./python.exe -m timeit sum(list(range(4000)))
-1000 loops, best of 3: 360 usec per loop
+1000 loops, best of 3: 276 usec per loop
./python.exe -m timeit sum(range(10000))
-1000 loops, best of 3: 615 usec per loop
+1000 loops, best of 3: 453 usec per loop
./python.exe -m timeit sum(list(range(10000)))
-1000 loops, best of 3: 887 usec per loop
+1000 loops, best of 3: 792 usec per loop
./python.exe -m timeit sum(range(40000))
-100 loops, best of 3: 2.55 msec per loop
+100 loops, best of 3: 2.03 msec per loop
./python.exe -m timeit sum(range(80000))
-100 loops, best of 3: 6.4 msec per loop
+100 loops, best of 3: 6.01 msec per loop
........................
And diffs of longfreelist2 vs longfreelist2-gps01:
........................
--- issue2013-longfreelist2.txt 2008-03-22 12:19:46.000000000 -0700
+++ issue2013-longfreelist2-gps01-size-first.txt 2008-03-22
12:34:44.000000000 -0700
@@ -1,43 +1,43 @@
-3.0a3+ (py3k:61746M, Mar 22 2008, 12:18:57)
+3.0a3+ (py3k:61746M, Mar 22 2008, 12:33:47)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)]
./python.exe -m timeit min(range(255))
-100000 loops, best of 3: 16.1 usec per loop
+100000 loops, best of 3: 15.8 usec per loop
./python.exe -m timeit -s n=1000000 sum(range(n,n+10000))
-1000 loops, best of 3: 1.25 msec per loop
+1000 loops, best of 3: 1.22 msec per loop
./python.exe -m timeit sum(range(-32768,32768))
-100 loops, best of 3: 2.95 msec per loop
+100 loops, best of 3: 2.63 msec per loop
./python.exe -m timeit sum(range(-1000000,1000000))
-10 loops, best of 3: 283 msec per loop
+10 loops, best of 3: 273 msec per loop
./python.exe -m timeit sum(range(4000))
-1000 loops, best of 3: 177 usec per loop
+10000 loops, best of 3: 151 usec per loop
./python.exe -m timeit sum(list(range(4000)))
-1000 loops, best of 3: 276 usec per loop
+1000 loops, best of 3: 274 usec per loop
./python.exe -m timeit sum(range(10000))
-1000 loops, best of 3: 453 usec per loop
+1000 loops, best of 3: 382 usec per loop
./python.exe -m timeit sum(list(range(10000)))
-1000 loops, best of 3: 792 usec per loop
+1000 loops, best of 3: 802 usec per loop
./python.exe -m timeit sum(range(40000))
-100 loops, best of 3: 2.03 msec per loop
+100 loops, best of 3: 1.79 msec per loop
./python.exe -m timeit sum(range(80000))
-100 loops, best of 3: 6.01 msec per loop
+100 loops, best of 3: 5.77 msec per loop |
Looking at how much memory was actually used by PyLongObjects the way why? sizeof(PyVarObject) == 12, sizeof(digit) == 2. Objects/obmalloc.c patch attached as -gps02. |
And the benchmarks of gps02... see the attached file. At the end I had it run pybench rather than these microbenchmarks that Things left for someone to do? Determine what a good size for the |
The problem with choosing a sensible freelist size is that we don't have I thought the patch to compact freelists at each full gc collection had |
Gregory, your patch probably deserves checking in, it doesn't seem to me |
preventing this right now is that when i apply this to py3k today, it first, my patch has an invalid assert(size > 0) in it in _PyLong_New as then things at least run but you'll end up in an infinite loop when the things work great in a non-pydebug build. i believe the reason is this change is not properly looking at the i'm investigating. the free_list code in floatobject.c does not have and yet another reason why a more general free list library for various |
attached is an updated patch that also works in debug mode. to do this simplifying it to only freelist one digit longs the only useful i'm closing this as not worth it as things are written. if someone |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: