Issue 24165: Free list for single-digits ints

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/68353

classification

Title:	Free list for single-digits ints
Type:	performance	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.10

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	gvanrossum, larry, lemburg, mark.dickinson, methane, pablogsal, pitrou, rhettinger, scoder, serhiy.storchaka, vstinner, yselivanov
Priority:	normal	Keywords:	patch

Created on 2015-05-11 13:59 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
int_free_list_2.patch	serhiy.storchaka, 2015-05-11 13:59		review
int_free_list_multidigit.patch	serhiy.storchaka, 2015-05-11 21:43		review

Pull Requests
URL	Status	Linked	Edit
PR 22884	closed	methane, 2020-10-22 10:44

Messages (30)
msg242894 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-05-11 13:59
Proposed patch adds free list for single-digit PyLong objects. In Python tests 7% of created objects are ints. 50% of them are 15-bit (single-digit on 32-bit build), 75% of them are 30-bit (single-digit on 64-bit build). See the start of the discussion in issue24138.
msg242896 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2015-05-11 14:16
Any chance of running hg.python.org/benchmarks to see what kind of performance this would get us?
msg242907 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-05-11 18:25
Report on Linux xarax 3.13.0-52-generic #86-Ubuntu SMP Mon May 4 04:32:15 UTC 2015 i686 athlon Total CPU cores: 2 ### 2to3 ### 15.796000 -> 15.652000: 1.01x faster ### etree_generate ### Min: 0.687270 -> 0.715218: 1.04x slower Avg: 0.698458 -> 0.722657: 1.03x slower Significant (t=-9.02) Stddev: 0.01846 -> 0.00431: 4.2808x smaller ### etree_iterparse ### Min: 1.145829 -> 1.117311: 1.03x faster Avg: 1.159865 -> 1.129438: 1.03x faster Significant (t=21.95) Stddev: 0.00835 -> 0.00513: 1.6297x smaller ### etree_parse ### Min: 0.816515 -> 0.867189: 1.06x slower Avg: 0.825879 -> 0.877618: 1.06x slower Significant (t=-48.87) Stddev: 0.00405 -> 0.00630: 1.5556x larger ### etree_process ### Min: 0.542221 -> 0.565161: 1.04x slower Avg: 0.548276 -> 0.569324: 1.04x slower Significant (t=-28.38) Stddev: 0.00380 -> 0.00361: 1.0540x smaller ### json_load ### Min: 1.020657 -> 0.995001: 1.03x faster Avg: 1.025593 -> 0.998038: 1.03x faster Significant (t=28.37) Stddev: 0.00503 -> 0.00468: 1.0738x smaller ### nbody ### Min: 0.577393 -> 0.588626: 1.02x slower Avg: 0.578246 -> 0.590917: 1.02x slower Significant (t=-43.51) Stddev: 0.00037 -> 0.00203: 5.4513x larger ### regex_v8 ### Min: 0.123794 -> 0.119950: 1.03x faster Avg: 0.124631 -> 0.121131: 1.03x faster Significant (t=4.92) Stddev: 0.00340 -> 0.00371: 1.0917x larger The following not significant results are hidden, use -v to show them: django_v2, fastpickle, fastunpickle, json_dump_v2, tornado_http.
msg242910 - (view)	Author: Stefan Behnel (scoder) *	Date: 2015-05-11 19:57
I got similar results on 64bits for my original patch (very similar to what Serhiy used now). The numbers are not really conclusive. Report on Linux leppy 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64 Total CPU cores: 4 ### 2to3 ### 6.885334 -> 6.829016: 1.01x faster ### etree_process ### Min: 0.249504 -> 0.253876: 1.02x slower Med: 0.252730 -> 0.258274: 1.02x slower Avg: 0.254332 -> 0.261100: 1.03x slower Significant (t=-5.99) Stddev: 0.00478 -> 0.00640: 1.3391x larger ### fastpickle ### Min: 0.402085 -> 0.416765: 1.04x slower Med: 0.405595 -> 0.424729: 1.05x slower Avg: 0.405882 -> 0.429707: 1.06x slower Significant (t=-12.45) Stddev: 0.00228 -> 0.01334: 5.8585x larger ### json_dump_v2 ### Min: 2.611031 -> 2.522507: 1.04x faster Med: 2.678369 -> 2.544085: 1.05x faster Avg: 2.706089 -> 2.552111: 1.06x faster Significant (t=10.40) Stddev: 0.09551 -> 0.04290: 2.2265x smaller ### nbody ### Min: 0.217901 -> 0.214968: 1.01x faster Med: 0.224340 -> 0.216781: 1.03x faster Avg: 0.226012 -> 0.216981: 1.04x faster Significant (t=6.03) Stddev: 0.01049 -> 0.00142: 7.4102x smaller ### regex_v8 ### Min: 0.040856 -> 0.039377: 1.04x faster Med: 0.041847 -> 0.040082: 1.04x faster Avg: 0.042468 -> 0.040726: 1.04x faster Significant (t=3.20) Stddev: 0.00291 -> 0.00252: 1.1549x smaller The following not significant results are hidden, use -v to show them: etree_generate, etree_iterparse, etree_parse, fastunpickle, json_load.
msg242911 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-05-11 19:59
You probably need a workload that uses integers quite heavily to see a difference. And even then, it would also depend on the allocation pattern.
msg242913 - (view)	Author: Stefan Behnel (scoder) *	Date: 2015-05-11 20:06
Well, as I've shown in issue 24076 (I'm copying the numbers here), even simple arithmetic expressions can benefit from a free-list. Basically anything that uses temporary integer results. Original: $ ./python -m timeit 'sum(range(1, 100000))' 1000 loops, best of 3: 1.86 msec per loop $ ./python -m timeit -s 'l = list(range(1000, 10000))' '[(i2+5) // 7 for i in l]' 1000 loops, best of 3: 1.05 msec per loop With freelist: $ ./python -m timeit 'sum(range(1, 100000))' 1000 loops, best of 3: 1.52 msec per loop $ ./python -m timeit -s 'l = list(range(1000, 10000))' '[(i2+5) // 7 for i in l]' 1000 loops, best of 3: 931 usec per loop
msg242915 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-05-11 20:22
Yes, but I meant a realistic workload, not a micro-benchmark. There are tons of ways to make Python look faster on micro-benchmarks but that have no relevant impact on actual applications. (note that I'm still sympathetic to the freelist approach)
msg242919 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-05-11 21:43
Oh, sorry, Stefan, I didn't noticed your patch. I wouldn't write my patch if noticed your patch. int_free_list_2.patch adds free list only for single-digits ints. Following patch adds free list for multi-digit ints (3 on 32-bit build, 2 on 64-bit build) enough to represent 32-bit integers. Unfortunately it makes allocating/deallocating of single-digit ints slower. Microbenchmarks: $ ./python -m timeit -s "r = range(104)" -- "for i in r: pass" Unpatched: 1000 loops, best of 3: 603 usec per loop 1-digit free list: 1000 loops, best of 3: 390 usec per loop Multi-digit free list: 1000 loops, best of 3: 428 usec per loop $ ./python -m timeit -s "r = range(105)" -- "for i in r: pass" Unpatched: 100 loops, best of 3: 6.12 msec per loop 1-digit free list: 100 loops, best of 3: 5.69 msec per loop Multi-digit free list: 100 loops, best of 3: 4.36 msec per loop $ ./python -m timeit -s "a = list(range(104))" -- "for i, x in enumerate(a): pass" Unpatched: 1000 loops, best of 3: 1.25 msec per loop 1-digit free list: 1000 loops, best of 3: 929 usec per loop Multi-digit free list: 1000 loops, best of 3: 968 usec per loop $ ./python -m timeit -s "a = list(range(105))" -- "for i, x in enumerate(a): pass" Unpatched: 100 loops, best of 3: 11.7 msec per loop 1-digit free list: 100 loops, best of 3: 10.9 msec per loop Multi-digit free list: 100 loops, best of 3: 9.99 msec per loop As for more realistic cases, base85 encoding is 5% faster with multi-digit free list. $ ./python -m timeit -s "from base64 import b85encode; a = bytes(range(256))*100" -- "b85encode(a)" Unpatched: 100 loops, best of 3: 10 msec per loop 1-digit free list: 100 loops, best of 3: 9.85 msec per loop Multi-digit free list: 100 loops, best of 3: 9.48 msec per loop
msg260128 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2016-02-11 19:38
I think that we only need to add free-list for 1-digit longs. Please see my patch & explanation in issue #26341.
msg260129 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-02-11 19:57
Did you test on platform with 30-bit digits? I tested with 15-bit digits. Could you repeat my microbenchmarks from msg242919?
msg260130 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2016-02-11 19:59
> Did you test on platform with 30-bit digits? Yes. > Could you repeat my microbenchmarks from msg242919? Sure. With your patches or with mine from issue #26341?
msg260131 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-02-11 20:09
With all three patches if you want (I don't expect a difference between your patch and my single-digit patch).
msg260133 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2016-02-11 21:29
Best of 5s: -m timeit -s "r = range(104)" -- "for i in r: pass" orig: 239 usec my patch: 148 int_free_list_2: 151 int_free_list_multi: 156 -m timeit -s "r = range(105)" -- "for i in r: pass" orig: 2.4 msec my patch: 1.47 int_free_list_2: 1.53 int_free_list_multi: 1.57 -m timeit -s "a = list(range(104))" -- "for i, x in enumerate(a): pass" orig: 416 usec my: 314 int_free_list_2: 314 int_free_list_multi: 317 -m timeit -s "a = list(range(105))" -- "for i, x in enumerate(a): pass" orig: 4.1 msec my: 3.13 int_free_list_2: 3.14 int_free_list_multi: 3.13 -m timeit -s "from base64 import b85encode; a = bytes(range(256))*100" -- "b85encode(a)" orig: 3.49 msec my: 3.28 int_free_list_2: 3.30 int_free_list_multi: 3.31 -m timeit -s "loops=tuple(range(1000))" "for x in loops: x+x" orig: 44.4 usec my: 35.2 int_free_list_2: 35.4 int_free_list_multi: 35.5 spectral_norm (against default): my: 1.12x faster int_free_list_2: 1.12x faster int_free_list_multi: 1.12x faster ========== All in all, all patches show the same performance improvement. I guess we can go with int_free_list_multi.
msg260148 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-02-11 23:48
I ran perf.py on long_fl.patch of issue #26341. It looks slower and has no impact on such macro benchmark. ~/bin/taskset_isolated.py time python3 -u perf.py --rigorous ../default/python.orig ../default/python_long_fl # python rev 37bacf3fa1f5 Report on Linux smithers 4.3.4-300.fc23.x86_64 #1 SMP Mon Jan 25 13:39:23 UTC 2016 x86_64 x86_64 Total CPU cores: 8 ### chameleon_v2 ### Min: 5.660445 -> 5.809548: 1.03x slower Avg: 5.707313 -> 5.851431: 1.03x slower Significant (t=-31.76) Stddev: 0.03655 -> 0.02690: 1.3585x smaller ### json_dump_v2 ### Min: 2.745682 -> 2.819627: 1.03x slower Avg: 2.769530 -> 2.838116: 1.02x slower Significant (t=-42.78) Stddev: 0.01019 -> 0.01238: 1.2147x larger ### regex_v8 ### Min: 0.041680 -> 0.041081: 1.01x faster Avg: 0.042383 -> 0.041265: 1.03x faster Significant (t=6.49) Stddev: 0.00122 -> 0.00121: 1.0077x smaller The following not significant results are hidden, use -v to show them: 2to3, django_v3, fastpickle, fastunpickle, json_load, nbody, tornado_http.
msg260149 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2016-02-11 23:58
I also ran benchmarks. For me, django was 1% faster, telco 5% slower, and the rest were the same. telco is a decimal benchmarks (ints aren't used there), and django/chameleon are unicode concatenation benchmarks. I can see improvements in micro benchmarks, but even more importantly, Serhiy's patch reduces memory fragmentations. 99% of all long allocations are coming from freelist when it's there.
msg260167 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-02-12 09:53
I ran again the benchmark on long_fl.patch of issue #26341 with -b all. The problem is that I don't know what to think about the benchmark, to me all these number only look like noise :-/ If we ignore changes smaller than 1.05 (positive or negative), the patch has no impact on performance on such macro benchmark. I didn't say that the patches are useless :-) We may focus on micro-benchmark? $ ~/bin/taskset_isolated.py time python3 -u perf.py --rigorous ../default/python.orig ../default/python_long_fl -b all Report on Linux smithers 4.3.4-300.fc23.x86_64 #1 SMP Mon Jan 25 13:39:23 UTC 2016 x86_64 x86_64 Total CPU cores: 8 ### call_method ### Min: 0.316851 -> 0.308606: 1.03x faster Avg: 0.317870 -> 0.309778: 1.03x faster Significant (t=480.37) Stddev: 0.00014 -> 0.00026: 1.8165x larger ### etree_parse ### Min: 0.266148 -> 0.255969: 1.04x faster Avg: 0.267591 -> 0.257492: 1.04x faster Significant (t=67.72) Stddev: 0.00108 -> 0.00103: 1.0478x smaller ### etree_process ### Min: 0.218512 -> 0.225462: 1.03x slower Avg: 0.220441 -> 0.227143: 1.03x slower Significant (t=-37.15) Stddev: 0.00128 -> 0.00127: 1.0035x smaller ### fannkuch ### Min: 0.962323 -> 0.984226: 1.02x slower Avg: 0.965782 -> 0.985413: 1.02x slower Significant (t=-73.63) Stddev: 0.00213 -> 0.00160: 1.3276x smaller ### float ### Min: 0.252470 -> 0.257536: 1.02x slower Avg: 0.259895 -> 0.265731: 1.02x slower Significant (t=-9.15) Stddev: 0.00426 -> 0.00474: 1.1125x larger ### json_dump_v2 ### Min: 2.717022 -> 2.814488: 1.04x slower Avg: 2.743981 -> 2.835444: 1.03x slower Significant (t=-46.41) Stddev: 0.01375 -> 0.01411: 1.0264x larger ### mako_v2 ### Min: 0.039410 -> 0.037304: 1.06x faster Avg: 0.040038 -> 0.038094: 1.05x faster Significant (t=138.56) Stddev: 0.00024 -> 0.00037: 1.5234x larger ### meteor_contest ### Min: 0.182787 -> 0.191944: 1.05x slower Avg: 0.183526 -> 0.193532: 1.05x slower Significant (t=-147.53) Stddev: 0.00031 -> 0.00060: 1.9114x larger ### nbody ### Min: 0.232746 -> 0.221279: 1.05x faster Avg: 0.233580 -> 0.222623: 1.05x faster Significant (t=67.66) Stddev: 0.00052 -> 0.00153: 2.9467x larger ### nqueens ### Min: 0.254579 -> 0.263282: 1.03x slower Avg: 0.256874 -> 0.264082: 1.03x slower Significant (t=-57.86) Stddev: 0.00110 -> 0.00059: 1.8689x smaller ### pickle_dict ### Min: 0.502160 -> 0.490473: 1.02x faster Avg: 0.502456 -> 0.490759: 1.02x faster Significant (t=654.42) Stddev: 0.00014 -> 0.00011: 1.1950x smaller ### raytrace ### Min: 1.271059 -> 1.309407: 1.03x slower Avg: 1.274115 -> 1.313171: 1.03x slower Significant (t=-206.50) Stddev: 0.00123 -> 0.00144: 1.1698x larger ### richards ### Min: 0.162761 -> 0.158441: 1.03x faster Avg: 0.164611 -> 0.160229: 1.03x faster Significant (t=30.03) Stddev: 0.00107 -> 0.00099: 1.0761x smaller ### simple_logging ### Min: 0.279392 -> 0.286003: 1.02x slower Avg: 0.280746 -> 0.287228: 1.02x slower Significant (t=-59.16) Stddev: 0.00075 -> 0.00080: 1.0760x larger ### telco ### Min: 0.012419 -> 0.011853: 1.05x faster Avg: 0.012500 -> 0.011968: 1.04x faster Significant (t=93.79) Stddev: 0.00003 -> 0.00005: 1.3307x larger The following not significant results are hidden, use -v to show them: 2to3, call_method_slots, call_method_unknown, call_simple, chameleon_v2, chaos, django_v3, etree_generate, etree_iterparse, fastpickle, fastunpickle, formatted_logging, go, hexiom2, json_load, normal_startup, pathlib, pickle_list, pidigits, regex_compile, regex_effbot, regex_v8, silent_logging, spectral_norm, startup_nosite, tornado_http, unpack_sequence, unpickle_list.
msg260171 - (view)	Author: Stefan Behnel (scoder) *	Date: 2016-02-12 10:34
I like Serhiy's patch, too, but it feels like the single-digit case should be enough. I found this comment by Yury a good argument: """ I can see improvements in micro benchmarks, but even more importantly, Serhiy's patch reduces memory fragmentations. 99% of all long allocations are coming from freelist when it's there. """ Did that comment come from a benchmark suite run? (i.e. actual applications and not micro benchmarks?) And, does it show a difference between the single- and multi-digit cases?
msg260178 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2016-02-12 13:59
> Did that comment come from a benchmark suite run? (i.e. actual applications and not micro benchmarks?) And, does it show a difference between the single- and multi-digit cases? Yes, more details here: http://bugs.python.org/issue26341#msg260124
msg260267 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-02-14 09:41
> 99% of all long allocations are coming from freelist when it's there. Detailed statistics from a test suite run see in msg242886. Only a half of ints are single-digit with 15-bit digits, and 3/4 with 30-bit digits. 86% of ints are 32-bit. The majority of ints (about 2/3) are small ints in the range [-5..256]. These patches don't affect them. That is why the effect of patches is not very significant.
msg260268 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2016-02-14 10:12
The test suite can't really be representative of common workloads and it isn't meant to be. The real question is not so much if the freelist helps reduce the number of integer allocations (it's obvious it will), it's whether doing so actually speeds up Python significantly. The small object allocator is quite fast. If freelisting one-digit integers doesn't bring any tangible benefits, it's unlikely that freelisting two-digit integers will. The general distribution of integers probably follows some kind of power law (which is why small integers are interned). And since most installs are probably 64-bit nowadays, single-digit integers go up to 2**30, which covers the immense majority of uses.
msg263410 - (view)	Author: Larry Hastings (larry) *	Date: 2016-04-14 14:47
FWIW, the patch still cleanly applies, but now a couple tests in posix fail because the assertion text has changed.
msg283687 - (view)	Author: Inada Naoki (methane) *	Date: 2016-12-20 13:03
Performance version: 0.5.0 Python version: 3.7.0a0 (64-bit) revision 31df7d9863f3+ Report on Linux-4.8.0-30-generic-x86_64-with-debian-stretch-sid Slower (13): - nbody: 232 ms +- 3 ms -> 241 ms +- 6 ms: 1.04x slower - unpack_sequence: 118 ns +- 3 ns -> 121 ns +- 0 ns: 1.03x slower - call_method_slots: 14.7 ms +- 0.1 ms -> 15.1 ms +- 0.4 ms: 1.02x slower - logging_silent: 724 ns +- 15 ns -> 740 ns +- 8 ns: 1.02x slower - telco: 22.5 ms +- 0.5 ms -> 22.9 ms +- 0.5 ms: 1.02x slower - sqlite_synth: 9.69 us +- 0.27 us -> 9.85 us +- 0.20 us: 1.02x slower - pickle_list: 8.45 us +- 0.11 us -> 8.57 us +- 0.16 us: 1.01x slower - pickle_dict: 61.5 us +- 0.5 us -> 62.1 us +- 4.1 us: 1.01x slower - call_method: 15.2 ms +- 0.1 ms -> 15.3 ms +- 0.1 ms: 1.01x slower - python_startup_no_site: 9.45 ms +- 0.02 ms -> 9.50 ms +- 0.02 ms: 1.00x slower - call_method_unknown: 17.2 ms +- 0.2 ms -> 17.2 ms +- 0.2 ms: 1.00x slower - meteor_contest: 197 ms +- 2 ms -> 198 ms +- 2 ms: 1.00x slower - python_startup: 15.7 ms +- 0.0 ms -> 15.7 ms +- 0.0 ms: 1.00x slower Faster (35): - spectral_norm: 284 ms +- 7 ms -> 262 ms +- 10 ms: 1.08x faster - scimark_sparse_mat_mult: 8.62 ms +- 0.30 ms -> 7.99 ms +- 0.22 ms: 1.08x faster - mako: 45.5 ms +- 0.3 ms -> 43.4 ms +- 0.6 ms: 1.05x faster - scimark_fft: 691 ms +- 13 ms -> 660 ms +- 13 ms: 1.05x faster - chameleon: 30.5 ms +- 0.3 ms -> 29.4 ms +- 0.5 ms: 1.04x faster - scimark_sor: 491 ms +- 9 ms -> 474 ms +- 8 ms: 1.04x faster - fannkuch: 1.07 sec +- 0.03 sec -> 1.04 sec +- 0.01 sec: 1.04x faster - crypto_pyaes: 229 ms +- 2 ms -> 222 ms +- 4 ms: 1.03x faster - hexiom: 23.5 ms +- 0.1 ms -> 22.8 ms +- 0.2 ms: 1.03x faster - regex_compile: 440 ms +- 5 ms -> 430 ms +- 3 ms: 1.03x faster - pickle: 24.3 us +- 0.5 us -> 23.7 us +- 0.5 us: 1.02x faster - unpickle: 31.6 us +- 0.3 us -> 30.9 us +- 0.3 us: 1.02x faster - xml_etree_generate: 291 ms +- 5 ms -> 284 ms +- 7 ms: 1.02x faster - xml_etree_process: 249 ms +- 3 ms -> 243 ms +- 4 ms: 1.02x faster - json_loads: 62.6 us +- 0.8 us -> 61.2 us +- 1.1 us: 1.02x faster - xml_etree_iterparse: 223 ms +- 6 ms -> 218 ms +- 5 ms: 1.02x faster - scimark_monte_carlo: 263 ms +- 8 ms -> 257 ms +- 9 ms: 1.02x faster - raytrace: 1.31 sec +- 0.01 sec -> 1.28 sec +- 0.01 sec: 1.02x faster - pickle_pure_python: 1.31 ms +- 0.01 ms -> 1.29 ms +- 0.02 ms: 1.02x faster - unpickle_pure_python: 923 us +- 15 us -> 906 us +- 32 us: 1.02x faster - chaos: 298 ms +- 2 ms -> 294 ms +- 2 ms: 1.01x faster - sympy_sum: 207 ms +- 6 ms -> 204 ms +- 6 ms: 1.01x faster - call_simple: 14.0 ms +- 0.3 ms -> 13.9 ms +- 0.3 ms: 1.01x faster - regex_v8: 46.0 ms +- 2.1 ms -> 45.5 ms +- 0.7 ms: 1.01x faster - genshi_text: 88.5 ms +- 0.9 ms -> 87.4 ms +- 1.3 ms: 1.01x faster - sympy_expand: 1.03 sec +- 0.01 sec -> 1.02 sec +- 0.01 sec: 1.01x faster - 2to3: 737 ms +- 3 ms -> 730 ms +- 3 ms: 1.01x faster - sympy_str: 462 ms +- 4 ms -> 458 ms +- 6 ms: 1.01x faster - unpickle_list: 7.67 us +- 0.32 us -> 7.60 us +- 0.11 us: 1.01x faster - go: 593 ms +- 3 ms -> 589 ms +- 5 ms: 1.01x faster - dulwich_log: 153 ms +- 1 ms -> 152 ms +- 1 ms: 1.01x faster - sqlalchemy_declarative: 311 ms +- 3 ms -> 309 ms +- 3 ms: 1.01x faster - pathlib: 50.3 ms +- 1.4 ms -> 50.0 ms +- 0.6 ms: 1.01x faster - django_template: 398 ms +- 3 ms -> 396 ms +- 5 ms: 1.01x faster - pidigits: 310 ms +- 0 ms -> 308 ms +- 0 ms: 1.00x faster Benchmark hidden because not significant (16): deltablue, float, genshi_xml, html5lib, json_dumps, logging_format, logging_simple, nqueens, regex_dna, regex_effbot, richards, scimark_lu, sqlalchemy_imperative, sympy_integrate, tornado_http, xml_etree_parse
msg379288 - (view)	Author: Inada Naoki (methane) *	Date: 2020-10-22 10:47
I updated the patch. I can not run pyperformance for now, because: AssertionError: would build wheel with unsupported tag ('cp310', 'cp310', 'linux_x86_64' I added this config, but it can not solve the problem: ``` $ cat ~/.config/pip/pip.conf [global] no-cache-dir = true ```
msg379311 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-10-22 16:17
Inada-san, you can run pyperformance with this workaround: python -m pip install pyperformance==1.0.0 We are fixing the error soon after https://discuss.python.org/t/pep-641-using-an-underscore-in-the-version-portion-of-python-3-10-compatibility-tags/5513 lands
msg379395 - (view)	Author: Inada Naoki (methane) *	Date: 2020-10-23 02:27
I heard pyperformance 1.0.0 works and here is the result of PR-22884. $ ./python-master -m pyperf compare_to master.json patched.json -G --min-speed=1 Slower (8): - pathlib: 26.3 ms +- 0.3 ms -> 26.8 ms +- 0.4 ms: 1.02x slower (+2%) - chameleon: 12.8 ms +- 0.1 ms -> 13.0 ms +- 0.1 ms: 1.02x slower (+2%) - genshi_text: 38.3 ms +- 0.7 ms -> 38.9 ms +- 0.6 ms: 1.02x slower (+2%) - sqlalchemy_imperative: 40.4 ms +- 0.9 ms -> 41.0 ms +- 0.8 ms: 1.02x slower (+2%) - sympy_str: 441 ms +- 4 ms -> 448 ms +- 4 ms: 1.01x slower (+1%) - chaos: 146 ms +- 1 ms -> 148 ms +- 2 ms: 1.01x slower (+1%) - unpickle: 18.7 us +- 0.1 us -> 18.9 us +- 0.2 us: 1.01x slower (+1%) - xml_etree_parse: 177 ms +- 2 ms -> 179 ms +- 3 ms: 1.01x slower (+1%) Faster (11): - scimark_sparse_mat_mult: 6.74 ms +- 0.18 ms -> 6.26 ms +- 0.03 ms: 1.08x faster (-7%) - scimark_fft: 511 ms +- 7 ms -> 496 ms +- 4 ms: 1.03x faster (-3%) - spectral_norm: 181 ms +- 2 ms -> 176 ms +- 3 ms: 1.03x faster (-3%) - pidigits: 225 ms +- 1 ms -> 219 ms +- 1 ms: 1.03x faster (-3%) - pickle_dict: 35.5 us +- 1.3 us -> 34.8 us +- 0.3 us: 1.02x faster (-2%) - pickle_list: 5.32 us +- 0.09 us -> 5.23 us +- 0.09 us: 1.02x faster (-2%) - pyflate: 883 ms +- 7 ms -> 867 ms +- 6 ms: 1.02x faster (-2%) - scimark_sor: 264 ms +- 2 ms -> 259 ms +- 2 ms: 1.02x faster (-2%) - sqlite_synth: 4.04 us +- 0.10 us -> 3.98 us +- 0.09 us: 1.02x faster (-1%) - regex_dna: 243 ms +- 3 ms -> 240 ms +- 1 ms: 1.01x faster (-1%) - crypto_pyaes: 165 ms +- 3 ms -> 163 ms +- 1 ms: 1.01x faster (-1%) Benchmark hidden because not significant (41)
msg379396 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2020-10-23 02:33
Inada-san, how do you interpret the results? Looks like it's performance-neutral.
msg379399 - (view)	Author: Inada Naoki (methane) *	Date: 2020-10-23 03:21
I had suspected that pypeformance just don't have enough workload for non-small int. For example, spectral_norm is integer heavy + some float warkload. But bm_spectral_norm uses `DEFAULT_N = 130`. So most integers are fit into smallint cache. On the othar hand, spectral_norm in the benchmarkgame uses N=5500. https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/spectralnorm-python3-8.html So I ran the benchmark on my machine: master: real 1m24.647s user 5m37.515s patched: real 1m19.033s user 5m14.682s master+increased small int from [-5, 256] to [-9, 1024] real 1m23.742s user 5m33.569s 314.682/337.515 = 0.9323496733478512. So ther is only 7% speedup even when N=5500. After all, I think it is doubtful. Let's stop this idea until situation is changed.
msg379521 - (view)	Author: Inada Naoki (methane) *	Date: 2020-10-24 06:39
I close this issue for now. Please reopen or create a new issue if you came up with better idea.
msg379526 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2020-10-24 09:42
I agree that it is not worth to add this optimization.
msg380134 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-11-01 13:33
> Inada-san, how do you interpret the results? Looks like it's performance-neutral. You should give a try to the development branch of pyperf which computes the geometric mean of all results and says if it's faster or slower overall :-D https://mail.python.org/archives/list/speed@python.org/thread/RANN6PQURUVPMNXS6GIOL42F2DIFV5LM/ (I'm still waiting for testers before releasing a new version including the new feature.)

History
Date	User	Action	Args
2022-04-11 14:58:16	admin	set	github: 68353
2021-09-18 22:09:17	gvanrossum	set	nosy: + gvanrossum
2020-11-01 13:33:58	vstinner	set	messages: + msg380134
2020-10-24 09:42:11	serhiy.storchaka	set	messages: + msg379526
2020-10-24 06:39:38	methane	set	status: open -> closed versions: + Python 3.10, - Python 3.7 messages: + msg379521 resolution: rejected stage: patch review -> resolved
2020-10-23 03:21:19	methane	set	messages: + msg379399
2020-10-23 02:33:38	yselivanov	set	messages: + msg379396
2020-10-23 02:30:49	yselivanov	set	nosy: + pablogsal
2020-10-23 02:27:04	methane	set	nosy: - pablogsal messages: + msg379395
2020-10-22 16:17:04	pablogsal	set	nosy: + pablogsal messages: + msg379311
2020-10-22 10:47:43	methane	set	messages: + msg379288
2020-10-22 10:44:59	methane	set	pull_requests: + pull_request21823
2020-05-29 17:46:31	brett.cannon	set	nosy: - brett.cannon
2016-12-20 13:03:36	methane	set	messages: + msg283687
2016-12-19 13:11:30	methane	set	nosy: + methane versions: + Python 3.7, - Python 3.6
2016-04-14 14:47:54	larry	set	messages: + msg263410
2016-02-14 10:12:07	pitrou	set	messages: + msg260268
2016-02-14 09:41:24	serhiy.storchaka	set	messages: + msg260267
2016-02-12 13:59:25	yselivanov	set	messages: + msg260178
2016-02-12 10:34:57	scoder	set	messages: + msg260171 versions: + Python 3.6, - Python 3.5
2016-02-12 09:54:01	vstinner	set	messages: + msg260167
2016-02-11 23:58:44	yselivanov	set	messages: + msg260149
2016-02-11 23:48:25	vstinner	set	messages: + msg260148
2016-02-11 21:29:37	yselivanov	set	messages: + msg260133
2016-02-11 20:09:28	serhiy.storchaka	set	messages: + msg260131
2016-02-11 19:59:31	yselivanov	set	messages: + msg260130
2016-02-11 19:57:51	BreamoreBoy	set	nosy: - BreamoreBoy
2016-02-11 19:57:32	serhiy.storchaka	set	messages: + msg260129
2016-02-11 19:38:16	yselivanov	set	nosy: + yselivanov messages: + msg260128
2016-02-11 19:37:31	yselivanov	link	issue26341 superseder
2015-09-22 10:31:20	vstinner	set	nosy: + vstinner
2015-07-21 07:15:25	ethan.furman	set	nosy: - ethan.furman
2015-05-11 21:43:57	serhiy.storchaka	set	files: + int_free_list_multidigit.patch messages: + msg242919
2015-05-11 20:22:10	pitrou	set	messages: + msg242915
2015-05-11 20:06:25	scoder	set	messages: + msg242913
2015-05-11 19:59:06	pitrou	set	messages: + msg242911
2015-05-11 19:57:07	scoder	set	messages: + msg242910
2015-05-11 18:25:03	serhiy.storchaka	set	messages: + msg242907
2015-05-11 14:16:43	brett.cannon	set	nosy: + brett.cannon messages: + msg242896
2015-05-11 13:59:29	serhiy.storchaka	create