This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Free list for single-digits ints
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.10
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: gvanrossum, larry, lemburg, mark.dickinson, methane, pablogsal, pitrou, rhettinger, scoder, serhiy.storchaka, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2015-05-11 13:59 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
int_free_list_2.patch serhiy.storchaka, 2015-05-11 13:59 review
int_free_list_multidigit.patch serhiy.storchaka, 2015-05-11 21:43 review
Pull Requests
URL Status Linked Edit
PR 22884 closed methane, 2020-10-22 10:44
Messages (30)
msg242894 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-11 13:59
Proposed patch adds free list for single-digit PyLong objects. In Python tests 7% of created objects are ints. 50% of them are 15-bit (single-digit on 32-bit build), 75% of them are 30-bit (single-digit on 64-bit build). See the start of the discussion in issue24138.
msg242896 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-05-11 14:16
Any chance of running hg.python.org/benchmarks to see what kind of performance this would get us?
msg242907 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-11 18:25
Report on Linux xarax 3.13.0-52-generic #86-Ubuntu SMP Mon May 4 04:32:15 UTC 2015 i686 athlon
Total CPU cores: 2

### 2to3 ###
15.796000 -> 15.652000: 1.01x faster

### etree_generate ###
Min: 0.687270 -> 0.715218: 1.04x slower
Avg: 0.698458 -> 0.722657: 1.03x slower
Significant (t=-9.02)
Stddev: 0.01846 -> 0.00431: 4.2808x smaller

### etree_iterparse ###
Min: 1.145829 -> 1.117311: 1.03x faster
Avg: 1.159865 -> 1.129438: 1.03x faster
Significant (t=21.95)
Stddev: 0.00835 -> 0.00513: 1.6297x smaller

### etree_parse ###
Min: 0.816515 -> 0.867189: 1.06x slower
Avg: 0.825879 -> 0.877618: 1.06x slower
Significant (t=-48.87)
Stddev: 0.00405 -> 0.00630: 1.5556x larger

### etree_process ###
Min: 0.542221 -> 0.565161: 1.04x slower
Avg: 0.548276 -> 0.569324: 1.04x slower
Significant (t=-28.38)
Stddev: 0.00380 -> 0.00361: 1.0540x smaller

### json_load ###
Min: 1.020657 -> 0.995001: 1.03x faster
Avg: 1.025593 -> 0.998038: 1.03x faster
Significant (t=28.37)
Stddev: 0.00503 -> 0.00468: 1.0738x smaller

### nbody ###
Min: 0.577393 -> 0.588626: 1.02x slower
Avg: 0.578246 -> 0.590917: 1.02x slower
Significant (t=-43.51)
Stddev: 0.00037 -> 0.00203: 5.4513x larger

### regex_v8 ###
Min: 0.123794 -> 0.119950: 1.03x faster
Avg: 0.124631 -> 0.121131: 1.03x faster
Significant (t=4.92)
Stddev: 0.00340 -> 0.00371: 1.0917x larger

The following not significant results are hidden, use -v to show them:
django_v2, fastpickle, fastunpickle, json_dump_v2, tornado_http.
msg242910 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2015-05-11 19:57
I got similar results on 64bits for my original patch (very similar to what Serhiy used now). The numbers are not really conclusive.

Report on Linux leppy 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64
Total CPU cores: 4

### 2to3 ###
6.885334 -> 6.829016: 1.01x faster

### etree_process ###
Min: 0.249504 -> 0.253876: 1.02x slower
Med: 0.252730 -> 0.258274: 1.02x slower
Avg: 0.254332 -> 0.261100: 1.03x slower
Significant (t=-5.99)
Stddev: 0.00478 -> 0.00640: 1.3391x larger

### fastpickle ###
Min: 0.402085 -> 0.416765: 1.04x slower
Med: 0.405595 -> 0.424729: 1.05x slower
Avg: 0.405882 -> 0.429707: 1.06x slower
Significant (t=-12.45)
Stddev: 0.00228 -> 0.01334: 5.8585x larger

### json_dump_v2 ###
Min: 2.611031 -> 2.522507: 1.04x faster
Med: 2.678369 -> 2.544085: 1.05x faster
Avg: 2.706089 -> 2.552111: 1.06x faster
Significant (t=10.40)
Stddev: 0.09551 -> 0.04290: 2.2265x smaller

### nbody ###
Min: 0.217901 -> 0.214968: 1.01x faster
Med: 0.224340 -> 0.216781: 1.03x faster
Avg: 0.226012 -> 0.216981: 1.04x faster
Significant (t=6.03)
Stddev: 0.01049 -> 0.00142: 7.4102x smaller

### regex_v8 ###
Min: 0.040856 -> 0.039377: 1.04x faster
Med: 0.041847 -> 0.040082: 1.04x faster
Avg: 0.042468 -> 0.040726: 1.04x faster
Significant (t=3.20)
Stddev: 0.00291 -> 0.00252: 1.1549x smaller

The following not significant results are hidden, use -v to show them:
etree_generate, etree_iterparse, etree_parse, fastunpickle, json_load.
msg242911 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-05-11 19:59
You probably need a workload that uses integers quite heavily to see a difference. And even then, it would also depend on the allocation pattern.
msg242913 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2015-05-11 20:06
Well, as I've shown in issue 24076 (I'm copying the numbers here), even simple arithmetic expressions can benefit from a free-list. Basically anything that uses temporary integer results.

Original:

$ ./python -m timeit 'sum(range(1, 100000))'
1000 loops, best of 3: 1.86 msec per loop

$ ./python -m timeit -s 'l = list(range(1000, 10000))' '[(i*2+5) // 7 for i in l]'
1000 loops, best of 3: 1.05 msec per loop


With freelist:

$ ./python -m timeit 'sum(range(1, 100000))'
1000 loops, best of 3: 1.52 msec per loop

$ ./python -m timeit -s 'l = list(range(1000, 10000))' '[(i*2+5) // 7 for i in l]'
1000 loops, best of 3: 931 usec per loop
msg242915 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-05-11 20:22
Yes, but I meant a realistic workload, not a micro-benchmark.
There are tons of ways to make Python look faster on micro-benchmarks
but that have no relevant impact on actual applications.
(note that I'm still sympathetic to the freelist approach)
msg242919 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-11 21:43
Oh, sorry, Stefan, I didn't noticed your patch. I wouldn't write my patch if noticed your patch.

int_free_list_2.patch adds free list only for single-digits ints. Following patch adds free list for multi-digit ints (3 on 32-bit build, 2 on 64-bit build) enough to represent 32-bit integers. Unfortunately it makes allocating/deallocating of single-digit ints slower.

Microbenchmarks:

$ ./python -m timeit -s "r = range(10**4)" -- "for i in r: pass"
Unpatched: 1000 loops, best of 3: 603 usec per loop
1-digit free list: 1000 loops, best of 3: 390 usec per loop
Multi-digit free list: 1000 loops, best of 3: 428 usec per loop

$ ./python -m timeit -s "r = range(10**5)" -- "for i in r: pass"
Unpatched: 100 loops, best of 3: 6.12 msec per loop
1-digit free list: 100 loops, best of 3: 5.69 msec per loop
Multi-digit free list: 100 loops, best of 3: 4.36 msec per loop

$ ./python -m timeit -s "a = list(range(10**4))" -- "for i, x in enumerate(a): pass"
Unpatched: 1000 loops, best of 3: 1.25 msec per loop
1-digit free list: 1000 loops, best of 3: 929 usec per loop
Multi-digit free list: 1000 loops, best of 3: 968 usec per loop

$ ./python -m timeit -s "a = list(range(10**5))" -- "for i, x in enumerate(a): pass"
Unpatched: 100 loops, best of 3: 11.7 msec per loop
1-digit free list: 100 loops, best of 3: 10.9 msec per loop
Multi-digit free list: 100 loops, best of 3: 9.99 msec per loop

As for more realistic cases, base85 encoding is 5% faster with multi-digit free list.

$ ./python -m timeit -s "from base64 import b85encode; a = bytes(range(256))*100" -- "b85encode(a)"
Unpatched: 100 loops, best of 3: 10 msec per loop
1-digit free list: 100 loops, best of 3: 9.85 msec per loop
Multi-digit free list: 100 loops, best of 3: 9.48 msec per loop
msg260128 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-02-11 19:38
I think that we only need to add free-list for 1-digit longs.  Please see my patch & explanation in issue #26341.
msg260129 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-11 19:57
Did you test on platform with 30-bit digits? I tested with 15-bit digits.

Could you repeat my microbenchmarks from msg242919?
msg260130 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-02-11 19:59
> Did you test on platform with 30-bit digits?

Yes.

> Could you repeat my microbenchmarks from msg242919?

Sure. With your patches or with mine from issue #26341?
msg260131 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-11 20:09
With all three patches if you want (I don't expect a difference between your patch and my single-digit patch).
msg260133 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-02-11 21:29
Best of 5s:

-m timeit -s "r = range(10**4)" -- "for i in r: pass"
orig: 239 usec
my patch: 148
int_free_list_2: 151
int_free_list_multi: 156


-m timeit -s "r = range(10**5)" -- "for i in r: pass"
orig: 2.4 msec
my patch: 1.47
int_free_list_2: 1.53
int_free_list_multi: 1.57


-m timeit -s "a = list(range(10**4))" -- "for i, x in enumerate(a): pass"
orig: 416 usec
my: 314
int_free_list_2: 314
int_free_list_multi: 317


-m timeit -s "a = list(range(10**5))" -- "for i, x in enumerate(a): pass"
orig: 4.1 msec
my: 3.13
int_free_list_2: 3.14
int_free_list_multi: 3.13


-m timeit -s "from base64 import b85encode; a = bytes(range(256))*100" -- "b85encode(a)"
orig: 3.49 msec
my: 3.28
int_free_list_2: 3.30
int_free_list_multi: 3.31


-m timeit -s "loops=tuple(range(1000))" "for x in loops: x+x"
orig: 44.4 usec
my: 35.2
int_free_list_2: 35.4
int_free_list_multi: 35.5


spectral_norm (against default):
my: 1.12x faster
int_free_list_2: 1.12x faster
int_free_list_multi: 1.12x faster


==========

All in all, all patches show the same performance improvement.  I guess we can go with int_free_list_multi.
msg260148 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-02-11 23:48
I ran perf.py on long_fl.patch of issue #26341. It looks slower and has no impact on such macro benchmark.

~/bin/taskset_isolated.py time python3 -u perf.py --rigorous ../default/python.orig ../default/python_long_fl

# python rev 37bacf3fa1f5

Report on Linux smithers 4.3.4-300.fc23.x86_64 #1 SMP Mon Jan 25 13:39:23 UTC 2016 x86_64 x86_64
Total CPU cores: 8

### chameleon_v2 ###
Min: 5.660445 -> 5.809548: 1.03x slower
Avg: 5.707313 -> 5.851431: 1.03x slower
Significant (t=-31.76)
Stddev: 0.03655 -> 0.02690: 1.3585x smaller

### json_dump_v2 ###
Min: 2.745682 -> 2.819627: 1.03x slower
Avg: 2.769530 -> 2.838116: 1.02x slower
Significant (t=-42.78)
Stddev: 0.01019 -> 0.01238: 1.2147x larger

### regex_v8 ###
Min: 0.041680 -> 0.041081: 1.01x faster
Avg: 0.042383 -> 0.041265: 1.03x faster
Significant (t=6.49)
Stddev: 0.00122 -> 0.00121: 1.0077x smaller

The following not significant results are hidden, use -v to show them:
2to3, django_v3, fastpickle, fastunpickle, json_load, nbody, tornado_http.
msg260149 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-02-11 23:58
I also ran benchmarks.  For me, django was 1% faster, telco 5% slower, and the rest were the same.  telco is a decimal benchmarks (ints aren't used there), and django/chameleon are unicode concatenation benchmarks.

I can see improvements in micro benchmarks, but even more importantly, Serhiy's patch reduces memory fragmentations.  99% of all long allocations are coming from freelist when it's there.
msg260167 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-02-12 09:53
I ran again the benchmark on long_fl.patch of issue #26341 with -b all. The problem is that I don't know what to think about the benchmark, to me all these number only look like noise :-/ If we ignore changes smaller than 1.05 (positive or negative), the patch has no impact on performance on such macro benchmark.

I didn't say that the patches are useless :-) We may focus on micro-benchmark?

$ ~/bin/taskset_isolated.py time python3 -u perf.py --rigorous ../default/python.orig ../default/python_long_fl -b all

Report on Linux smithers 4.3.4-300.fc23.x86_64 #1 SMP Mon Jan 25 13:39:23 UTC 2016 x86_64 x86_64
Total CPU cores: 8

### call_method ###
Min: 0.316851 -> 0.308606: 1.03x faster
Avg: 0.317870 -> 0.309778: 1.03x faster
Significant (t=480.37)
Stddev: 0.00014 -> 0.00026: 1.8165x larger

### etree_parse ###
Min: 0.266148 -> 0.255969: 1.04x faster
Avg: 0.267591 -> 0.257492: 1.04x faster
Significant (t=67.72)
Stddev: 0.00108 -> 0.00103: 1.0478x smaller

### etree_process ###
Min: 0.218512 -> 0.225462: 1.03x slower
Avg: 0.220441 -> 0.227143: 1.03x slower
Significant (t=-37.15)
Stddev: 0.00128 -> 0.00127: 1.0035x smaller

### fannkuch ###
Min: 0.962323 -> 0.984226: 1.02x slower
Avg: 0.965782 -> 0.985413: 1.02x slower
Significant (t=-73.63)
Stddev: 0.00213 -> 0.00160: 1.3276x smaller

### float ###
Min: 0.252470 -> 0.257536: 1.02x slower
Avg: 0.259895 -> 0.265731: 1.02x slower
Significant (t=-9.15)
Stddev: 0.00426 -> 0.00474: 1.1125x larger

### json_dump_v2 ###
Min: 2.717022 -> 2.814488: 1.04x slower
Avg: 2.743981 -> 2.835444: 1.03x slower
Significant (t=-46.41)
Stddev: 0.01375 -> 0.01411: 1.0264x larger

### mako_v2 ###
Min: 0.039410 -> 0.037304: 1.06x faster
Avg: 0.040038 -> 0.038094: 1.05x faster
Significant (t=138.56)
Stddev: 0.00024 -> 0.00037: 1.5234x larger

### meteor_contest ###
Min: 0.182787 -> 0.191944: 1.05x slower
Avg: 0.183526 -> 0.193532: 1.05x slower
Significant (t=-147.53)
Stddev: 0.00031 -> 0.00060: 1.9114x larger

### nbody ###
Min: 0.232746 -> 0.221279: 1.05x faster
Avg: 0.233580 -> 0.222623: 1.05x faster
Significant (t=67.66)
Stddev: 0.00052 -> 0.00153: 2.9467x larger

### nqueens ###
Min: 0.254579 -> 0.263282: 1.03x slower
Avg: 0.256874 -> 0.264082: 1.03x slower
Significant (t=-57.86)
Stddev: 0.00110 -> 0.00059: 1.8689x smaller

### pickle_dict ###
Min: 0.502160 -> 0.490473: 1.02x faster
Avg: 0.502456 -> 0.490759: 1.02x faster
Significant (t=654.42)
Stddev: 0.00014 -> 0.00011: 1.1950x smaller

### raytrace ###
Min: 1.271059 -> 1.309407: 1.03x slower
Avg: 1.274115 -> 1.313171: 1.03x slower
Significant (t=-206.50)
Stddev: 0.00123 -> 0.00144: 1.1698x larger

### richards ###
Min: 0.162761 -> 0.158441: 1.03x faster
Avg: 0.164611 -> 0.160229: 1.03x faster
Significant (t=30.03)
Stddev: 0.00107 -> 0.00099: 1.0761x smaller

### simple_logging ###
Min: 0.279392 -> 0.286003: 1.02x slower
Avg: 0.280746 -> 0.287228: 1.02x slower
Significant (t=-59.16)
Stddev: 0.00075 -> 0.00080: 1.0760x larger

### telco ###   
Min: 0.012419 -> 0.011853: 1.05x faster
Avg: 0.012500 -> 0.011968: 1.04x faster
Significant (t=93.79)
Stddev: 0.00003 -> 0.00005: 1.3307x larger

The following not significant results are hidden, use -v to show them:
2to3, call_method_slots, call_method_unknown, call_simple, chameleon_v2, chaos, django_v3, etree_generate, etree_iterparse, fastpickle, fastunpickle, formatted_logging, go, hexiom2, json_load, normal_startup, pathlib, pickle_list, pidigits, regex_compile, regex_effbot, regex_v8, silent_logging, spectral_norm, startup_nosite, tornado_http, unpack_sequence, unpickle_list.
msg260171 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2016-02-12 10:34
I like Serhiy's patch, too, but it feels like the single-digit case should be enough. I found this comment by Yury a good argument:

"""
I can see improvements in micro benchmarks, but even more importantly, Serhiy's patch reduces memory fragmentations.  99% of all long allocations are coming from freelist when it's there.
"""

Did that comment come from a benchmark suite run? (i.e. actual applications and not micro benchmarks?) And, does it show a difference between the single- and multi-digit cases?
msg260178 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-02-12 13:59
> Did that comment come from a benchmark suite run? (i.e. actual applications and not micro benchmarks?) And, does it show a difference between the single- and multi-digit cases?

Yes, more details here: http://bugs.python.org/issue26341#msg260124
msg260267 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-14 09:41
> 99% of all long allocations are coming from freelist when it's there.

Detailed statistics from a test suite run see in msg242886. Only a half of ints are single-digit with 15-bit digits, and 3/4 with 30-bit digits. 86% of ints are 32-bit. The majority of ints (about 2/3) are small ints in the range [-5..256]. These patches don't affect them.

That is why the effect of patches is not very significant.
msg260268 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2016-02-14 10:12
The test suite can't really be representative of common workloads and it isn't meant to be.

The real question is not so much if the freelist helps reduce the number of integer allocations (it's obvious it will), it's whether doing so actually speeds up Python significantly. The small object allocator is quite fast.

If freelisting one-digit integers doesn't bring any tangible benefits, it's unlikely that freelisting two-digit integers will. The general distribution of integers probably follows some kind of power law (which is why small integers are interned).

And since most installs are probably 64-bit nowadays, single-digit integers go up to 2**30, which covers the immense majority of uses.
msg263410 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-04-14 14:47
FWIW, the patch still cleanly applies, but now a couple tests in posix fail because the assertion text has changed.
msg283687 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2016-12-20 13:03
Performance version: 0.5.0
Python version: 3.7.0a0 (64-bit) revision 31df7d9863f3+
Report on Linux-4.8.0-30-generic-x86_64-with-debian-stretch-sid

Slower (13):
- nbody: 232 ms +- 3 ms -> 241 ms +- 6 ms: 1.04x slower
- unpack_sequence: 118 ns +- 3 ns -> 121 ns +- 0 ns: 1.03x slower
- call_method_slots: 14.7 ms +- 0.1 ms -> 15.1 ms +- 0.4 ms: 1.02x slower
- logging_silent: 724 ns +- 15 ns -> 740 ns +- 8 ns: 1.02x slower
- telco: 22.5 ms +- 0.5 ms -> 22.9 ms +- 0.5 ms: 1.02x slower
- sqlite_synth: 9.69 us +- 0.27 us -> 9.85 us +- 0.20 us: 1.02x slower
- pickle_list: 8.45 us +- 0.11 us -> 8.57 us +- 0.16 us: 1.01x slower
- pickle_dict: 61.5 us +- 0.5 us -> 62.1 us +- 4.1 us: 1.01x slower
- call_method: 15.2 ms +- 0.1 ms -> 15.3 ms +- 0.1 ms: 1.01x slower
- python_startup_no_site: 9.45 ms +- 0.02 ms -> 9.50 ms +- 0.02 ms: 1.00x slower
- call_method_unknown: 17.2 ms +- 0.2 ms -> 17.2 ms +- 0.2 ms: 1.00x slower
- meteor_contest: 197 ms +- 2 ms -> 198 ms +- 2 ms: 1.00x slower
- python_startup: 15.7 ms +- 0.0 ms -> 15.7 ms +- 0.0 ms: 1.00x slower

Faster (35):
- spectral_norm: 284 ms +- 7 ms -> 262 ms +- 10 ms: 1.08x faster
- scimark_sparse_mat_mult: 8.62 ms +- 0.30 ms -> 7.99 ms +- 0.22 ms: 1.08x faster
- mako: 45.5 ms +- 0.3 ms -> 43.4 ms +- 0.6 ms: 1.05x faster
- scimark_fft: 691 ms +- 13 ms -> 660 ms +- 13 ms: 1.05x faster
- chameleon: 30.5 ms +- 0.3 ms -> 29.4 ms +- 0.5 ms: 1.04x faster
- scimark_sor: 491 ms +- 9 ms -> 474 ms +- 8 ms: 1.04x faster
- fannkuch: 1.07 sec +- 0.03 sec -> 1.04 sec +- 0.01 sec: 1.04x faster
- crypto_pyaes: 229 ms +- 2 ms -> 222 ms +- 4 ms: 1.03x faster
- hexiom: 23.5 ms +- 0.1 ms -> 22.8 ms +- 0.2 ms: 1.03x faster
- regex_compile: 440 ms +- 5 ms -> 430 ms +- 3 ms: 1.03x faster
- pickle: 24.3 us +- 0.5 us -> 23.7 us +- 0.5 us: 1.02x faster
- unpickle: 31.6 us +- 0.3 us -> 30.9 us +- 0.3 us: 1.02x faster
- xml_etree_generate: 291 ms +- 5 ms -> 284 ms +- 7 ms: 1.02x faster
- xml_etree_process: 249 ms +- 3 ms -> 243 ms +- 4 ms: 1.02x faster
- json_loads: 62.6 us +- 0.8 us -> 61.2 us +- 1.1 us: 1.02x faster
- xml_etree_iterparse: 223 ms +- 6 ms -> 218 ms +- 5 ms: 1.02x faster
- scimark_monte_carlo: 263 ms +- 8 ms -> 257 ms +- 9 ms: 1.02x faster
- raytrace: 1.31 sec +- 0.01 sec -> 1.28 sec +- 0.01 sec: 1.02x faster
- pickle_pure_python: 1.31 ms +- 0.01 ms -> 1.29 ms +- 0.02 ms: 1.02x faster
- unpickle_pure_python: 923 us +- 15 us -> 906 us +- 32 us: 1.02x faster
- chaos: 298 ms +- 2 ms -> 294 ms +- 2 ms: 1.01x faster
- sympy_sum: 207 ms +- 6 ms -> 204 ms +- 6 ms: 1.01x faster
- call_simple: 14.0 ms +- 0.3 ms -> 13.9 ms +- 0.3 ms: 1.01x faster
- regex_v8: 46.0 ms +- 2.1 ms -> 45.5 ms +- 0.7 ms: 1.01x faster
- genshi_text: 88.5 ms +- 0.9 ms -> 87.4 ms +- 1.3 ms: 1.01x faster
- sympy_expand: 1.03 sec +- 0.01 sec -> 1.02 sec +- 0.01 sec: 1.01x faster
- 2to3: 737 ms +- 3 ms -> 730 ms +- 3 ms: 1.01x faster
- sympy_str: 462 ms +- 4 ms -> 458 ms +- 6 ms: 1.01x faster
- unpickle_list: 7.67 us +- 0.32 us -> 7.60 us +- 0.11 us: 1.01x faster
- go: 593 ms +- 3 ms -> 589 ms +- 5 ms: 1.01x faster
- dulwich_log: 153 ms +- 1 ms -> 152 ms +- 1 ms: 1.01x faster
- sqlalchemy_declarative: 311 ms +- 3 ms -> 309 ms +- 3 ms: 1.01x faster
- pathlib: 50.3 ms +- 1.4 ms -> 50.0 ms +- 0.6 ms: 1.01x faster
- django_template: 398 ms +- 3 ms -> 396 ms +- 5 ms: 1.01x faster
- pidigits: 310 ms +- 0 ms -> 308 ms +- 0 ms: 1.00x faster

Benchmark hidden because not significant (16): deltablue, float, genshi_xml, html5lib, json_dumps, logging_format, logging_simple, nqueens, regex_dna, regex_effbot, richards, scimark_lu, sqlalchemy_imperative, sympy_integrate, tornado_http, xml_etree_parse
msg379288 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-10-22 10:47
I updated the patch.
I can not run pyperformance for now, because:

  AssertionError: would build wheel with unsupported tag ('cp310', 'cp310', 'linux_x86_64'

I added this config, but it can not solve the problem:

```
$ cat ~/.config/pip/pip.conf
[global]
no-cache-dir = true
```
msg379311 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-10-22 16:17
Inada-san, you can run pyperformance with this workaround:

python -m pip install pyperformance==1.0.0

We are fixing the error soon after https://discuss.python.org/t/pep-641-using-an-underscore-in-the-version-portion-of-python-3-10-compatibility-tags/5513 lands
msg379395 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-10-23 02:27
I heard pyperformance 1.0.0 works and here is the result of PR-22884.

$ ./python-master -m pyperf compare_to master.json patched.json -G --min-speed=1
Slower (8):
- pathlib: 26.3 ms +- 0.3 ms -> 26.8 ms +- 0.4 ms: 1.02x slower (+2%)
- chameleon: 12.8 ms +- 0.1 ms -> 13.0 ms +- 0.1 ms: 1.02x slower (+2%)
- genshi_text: 38.3 ms +- 0.7 ms -> 38.9 ms +- 0.6 ms: 1.02x slower (+2%)
- sqlalchemy_imperative: 40.4 ms +- 0.9 ms -> 41.0 ms +- 0.8 ms: 1.02x slower (+2%)
- sympy_str: 441 ms +- 4 ms -> 448 ms +- 4 ms: 1.01x slower (+1%)
- chaos: 146 ms +- 1 ms -> 148 ms +- 2 ms: 1.01x slower (+1%)
- unpickle: 18.7 us +- 0.1 us -> 18.9 us +- 0.2 us: 1.01x slower (+1%)
- xml_etree_parse: 177 ms +- 2 ms -> 179 ms +- 3 ms: 1.01x slower (+1%)

Faster (11):
- scimark_sparse_mat_mult: 6.74 ms +- 0.18 ms -> 6.26 ms +- 0.03 ms: 1.08x faster (-7%)
- scimark_fft: 511 ms +- 7 ms -> 496 ms +- 4 ms: 1.03x faster (-3%)
- spectral_norm: 181 ms +- 2 ms -> 176 ms +- 3 ms: 1.03x faster (-3%)
- pidigits: 225 ms +- 1 ms -> 219 ms +- 1 ms: 1.03x faster (-3%)
- pickle_dict: 35.5 us +- 1.3 us -> 34.8 us +- 0.3 us: 1.02x faster (-2%)
- pickle_list: 5.32 us +- 0.09 us -> 5.23 us +- 0.09 us: 1.02x faster (-2%)
- pyflate: 883 ms +- 7 ms -> 867 ms +- 6 ms: 1.02x faster (-2%)
- scimark_sor: 264 ms +- 2 ms -> 259 ms +- 2 ms: 1.02x faster (-2%)
- sqlite_synth: 4.04 us +- 0.10 us -> 3.98 us +- 0.09 us: 1.02x faster (-1%)
- regex_dna: 243 ms +- 3 ms -> 240 ms +- 1 ms: 1.01x faster (-1%)
- crypto_pyaes: 165 ms +- 3 ms -> 163 ms +- 1 ms: 1.01x faster (-1%)

Benchmark hidden because not significant (41)
msg379396 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2020-10-23 02:33
Inada-san, how do you interpret the results? Looks like it's performance-neutral.
msg379399 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-10-23 03:21
I had suspected that pypeformance just don't have enough workload for non-small int.

For example, spectral_norm is integer heavy + some float warkload. But bm_spectral_norm uses `DEFAULT_N = 130`. So most integers are fit into smallint cache.

On the othar hand, spectral_norm in the benchmarkgame uses N=5500.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/spectralnorm-python3-8.html

So I ran the benchmark on my machine:

master:
real    1m24.647s
user    5m37.515s

patched:
real    1m19.033s
user    5m14.682s

master+increased small int from [-5, 256] to [-9, 1024]
real    1m23.742s
user    5m33.569s


314.682/337.515 = 0.9323496733478512. So ther is only 7% speedup even when N=5500.

After all, I think it is doubtful. Let's stop this idea until situation is  changed.
msg379521 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-10-24 06:39
I close this issue for now. Please reopen or create a new issue if you came up with better idea.
msg379526 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-10-24 09:42
I agree that it is not worth to add this optimization.
msg380134 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-01 13:33
> Inada-san, how do you interpret the results? Looks like it's performance-neutral.

You should give a try to the development branch of pyperf which computes the geometric mean of all results and says if it's faster or slower overall :-D
https://mail.python.org/archives/list/speed@python.org/thread/RANN6PQURUVPMNXS6GIOL42F2DIFV5LM/
(I'm still waiting for testers before releasing a new version including the new feature.)
History
Date User Action Args
2022-04-11 14:58:16adminsetgithub: 68353
2021-09-18 22:09:17gvanrossumsetnosy: + gvanrossum
2020-11-01 13:33:58vstinnersetmessages: + msg380134
2020-10-24 09:42:11serhiy.storchakasetmessages: + msg379526
2020-10-24 06:39:38methanesetstatus: open -> closed
versions: + Python 3.10, - Python 3.7
messages: + msg379521

resolution: rejected
stage: patch review -> resolved
2020-10-23 03:21:19methanesetmessages: + msg379399
2020-10-23 02:33:38yselivanovsetmessages: + msg379396
2020-10-23 02:30:49yselivanovsetnosy: + pablogsal
2020-10-23 02:27:04methanesetnosy: - pablogsal
messages: + msg379395
2020-10-22 16:17:04pablogsalsetnosy: + pablogsal
messages: + msg379311
2020-10-22 10:47:43methanesetmessages: + msg379288
2020-10-22 10:44:59methanesetpull_requests: + pull_request21823
2020-05-29 17:46:31brett.cannonsetnosy: - brett.cannon
2016-12-20 13:03:36methanesetmessages: + msg283687
2016-12-19 13:11:30methanesetnosy: + methane

versions: + Python 3.7, - Python 3.6
2016-04-14 14:47:54larrysetmessages: + msg263410
2016-02-14 10:12:07pitrousetmessages: + msg260268
2016-02-14 09:41:24serhiy.storchakasetmessages: + msg260267
2016-02-12 13:59:25yselivanovsetmessages: + msg260178
2016-02-12 10:34:57scodersetmessages: + msg260171
versions: + Python 3.6, - Python 3.5
2016-02-12 09:54:01vstinnersetmessages: + msg260167
2016-02-11 23:58:44yselivanovsetmessages: + msg260149
2016-02-11 23:48:25vstinnersetmessages: + msg260148
2016-02-11 21:29:37yselivanovsetmessages: + msg260133
2016-02-11 20:09:28serhiy.storchakasetmessages: + msg260131
2016-02-11 19:59:31yselivanovsetmessages: + msg260130
2016-02-11 19:57:51BreamoreBoysetnosy: - BreamoreBoy
2016-02-11 19:57:32serhiy.storchakasetmessages: + msg260129
2016-02-11 19:38:16yselivanovsetnosy: + yselivanov
messages: + msg260128
2016-02-11 19:37:31yselivanovlinkissue26341 superseder
2015-09-22 10:31:20vstinnersetnosy: + vstinner
2015-07-21 07:15:25ethan.furmansetnosy: - ethan.furman
2015-05-11 21:43:57serhiy.storchakasetfiles: + int_free_list_multidigit.patch

messages: + msg242919
2015-05-11 20:22:10pitrousetmessages: + msg242915
2015-05-11 20:06:25scodersetmessages: + msg242913
2015-05-11 19:59:06pitrousetmessages: + msg242911
2015-05-11 19:57:07scodersetmessages: + msg242910
2015-05-11 18:25:03serhiy.storchakasetmessages: + msg242907
2015-05-11 14:16:43brett.cannonsetnosy: + brett.cannon
messages: + msg242896
2015-05-11 13:59:29serhiy.storchakacreate