Issue24165
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015-05-11 13:59 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
int_free_list_2.patch | serhiy.storchaka, 2015-05-11 13:59 | review | ||
int_free_list_multidigit.patch | serhiy.storchaka, 2015-05-11 21:43 | review |
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 22884 | closed | methane, 2020-10-22 10:44 |
Messages (30) | |||
---|---|---|---|
msg242894 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2015-05-11 13:59 | |
Proposed patch adds free list for single-digit PyLong objects. In Python tests 7% of created objects are ints. 50% of them are 15-bit (single-digit on 32-bit build), 75% of them are 30-bit (single-digit on 64-bit build). See the start of the discussion in issue24138. |
|||
msg242896 - (view) | Author: Brett Cannon (brett.cannon) * ![]() |
Date: 2015-05-11 14:16 | |
Any chance of running hg.python.org/benchmarks to see what kind of performance this would get us? |
|||
msg242907 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2015-05-11 18:25 | |
Report on Linux xarax 3.13.0-52-generic #86-Ubuntu SMP Mon May 4 04:32:15 UTC 2015 i686 athlon Total CPU cores: 2 ### 2to3 ### 15.796000 -> 15.652000: 1.01x faster ### etree_generate ### Min: 0.687270 -> 0.715218: 1.04x slower Avg: 0.698458 -> 0.722657: 1.03x slower Significant (t=-9.02) Stddev: 0.01846 -> 0.00431: 4.2808x smaller ### etree_iterparse ### Min: 1.145829 -> 1.117311: 1.03x faster Avg: 1.159865 -> 1.129438: 1.03x faster Significant (t=21.95) Stddev: 0.00835 -> 0.00513: 1.6297x smaller ### etree_parse ### Min: 0.816515 -> 0.867189: 1.06x slower Avg: 0.825879 -> 0.877618: 1.06x slower Significant (t=-48.87) Stddev: 0.00405 -> 0.00630: 1.5556x larger ### etree_process ### Min: 0.542221 -> 0.565161: 1.04x slower Avg: 0.548276 -> 0.569324: 1.04x slower Significant (t=-28.38) Stddev: 0.00380 -> 0.00361: 1.0540x smaller ### json_load ### Min: 1.020657 -> 0.995001: 1.03x faster Avg: 1.025593 -> 0.998038: 1.03x faster Significant (t=28.37) Stddev: 0.00503 -> 0.00468: 1.0738x smaller ### nbody ### Min: 0.577393 -> 0.588626: 1.02x slower Avg: 0.578246 -> 0.590917: 1.02x slower Significant (t=-43.51) Stddev: 0.00037 -> 0.00203: 5.4513x larger ### regex_v8 ### Min: 0.123794 -> 0.119950: 1.03x faster Avg: 0.124631 -> 0.121131: 1.03x faster Significant (t=4.92) Stddev: 0.00340 -> 0.00371: 1.0917x larger The following not significant results are hidden, use -v to show them: django_v2, fastpickle, fastunpickle, json_dump_v2, tornado_http. |
|||
msg242910 - (view) | Author: Stefan Behnel (scoder) * ![]() |
Date: 2015-05-11 19:57 | |
I got similar results on 64bits for my original patch (very similar to what Serhiy used now). The numbers are not really conclusive. Report on Linux leppy 3.13.0-46-generic #77-Ubuntu SMP Mon Mar 2 18:23:39 UTC 2015 x86_64 x86_64 Total CPU cores: 4 ### 2to3 ### 6.885334 -> 6.829016: 1.01x faster ### etree_process ### Min: 0.249504 -> 0.253876: 1.02x slower Med: 0.252730 -> 0.258274: 1.02x slower Avg: 0.254332 -> 0.261100: 1.03x slower Significant (t=-5.99) Stddev: 0.00478 -> 0.00640: 1.3391x larger ### fastpickle ### Min: 0.402085 -> 0.416765: 1.04x slower Med: 0.405595 -> 0.424729: 1.05x slower Avg: 0.405882 -> 0.429707: 1.06x slower Significant (t=-12.45) Stddev: 0.00228 -> 0.01334: 5.8585x larger ### json_dump_v2 ### Min: 2.611031 -> 2.522507: 1.04x faster Med: 2.678369 -> 2.544085: 1.05x faster Avg: 2.706089 -> 2.552111: 1.06x faster Significant (t=10.40) Stddev: 0.09551 -> 0.04290: 2.2265x smaller ### nbody ### Min: 0.217901 -> 0.214968: 1.01x faster Med: 0.224340 -> 0.216781: 1.03x faster Avg: 0.226012 -> 0.216981: 1.04x faster Significant (t=6.03) Stddev: 0.01049 -> 0.00142: 7.4102x smaller ### regex_v8 ### Min: 0.040856 -> 0.039377: 1.04x faster Med: 0.041847 -> 0.040082: 1.04x faster Avg: 0.042468 -> 0.040726: 1.04x faster Significant (t=3.20) Stddev: 0.00291 -> 0.00252: 1.1549x smaller The following not significant results are hidden, use -v to show them: etree_generate, etree_iterparse, etree_parse, fastunpickle, json_load. |
|||
msg242911 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2015-05-11 19:59 | |
You probably need a workload that uses integers quite heavily to see a difference. And even then, it would also depend on the allocation pattern. |
|||
msg242913 - (view) | Author: Stefan Behnel (scoder) * ![]() |
Date: 2015-05-11 20:06 | |
Well, as I've shown in issue 24076 (I'm copying the numbers here), even simple arithmetic expressions can benefit from a free-list. Basically anything that uses temporary integer results. Original: $ ./python -m timeit 'sum(range(1, 100000))' 1000 loops, best of 3: 1.86 msec per loop $ ./python -m timeit -s 'l = list(range(1000, 10000))' '[(i*2+5) // 7 for i in l]' 1000 loops, best of 3: 1.05 msec per loop With freelist: $ ./python -m timeit 'sum(range(1, 100000))' 1000 loops, best of 3: 1.52 msec per loop $ ./python -m timeit -s 'l = list(range(1000, 10000))' '[(i*2+5) // 7 for i in l]' 1000 loops, best of 3: 931 usec per loop |
|||
msg242915 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2015-05-11 20:22 | |
Yes, but I meant a realistic workload, not a micro-benchmark. There are tons of ways to make Python look faster on micro-benchmarks but that have no relevant impact on actual applications. (note that I'm still sympathetic to the freelist approach) |
|||
msg242919 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2015-05-11 21:43 | |
Oh, sorry, Stefan, I didn't noticed your patch. I wouldn't write my patch if noticed your patch. int_free_list_2.patch adds free list only for single-digits ints. Following patch adds free list for multi-digit ints (3 on 32-bit build, 2 on 64-bit build) enough to represent 32-bit integers. Unfortunately it makes allocating/deallocating of single-digit ints slower. Microbenchmarks: $ ./python -m timeit -s "r = range(10**4)" -- "for i in r: pass" Unpatched: 1000 loops, best of 3: 603 usec per loop 1-digit free list: 1000 loops, best of 3: 390 usec per loop Multi-digit free list: 1000 loops, best of 3: 428 usec per loop $ ./python -m timeit -s "r = range(10**5)" -- "for i in r: pass" Unpatched: 100 loops, best of 3: 6.12 msec per loop 1-digit free list: 100 loops, best of 3: 5.69 msec per loop Multi-digit free list: 100 loops, best of 3: 4.36 msec per loop $ ./python -m timeit -s "a = list(range(10**4))" -- "for i, x in enumerate(a): pass" Unpatched: 1000 loops, best of 3: 1.25 msec per loop 1-digit free list: 1000 loops, best of 3: 929 usec per loop Multi-digit free list: 1000 loops, best of 3: 968 usec per loop $ ./python -m timeit -s "a = list(range(10**5))" -- "for i, x in enumerate(a): pass" Unpatched: 100 loops, best of 3: 11.7 msec per loop 1-digit free list: 100 loops, best of 3: 10.9 msec per loop Multi-digit free list: 100 loops, best of 3: 9.99 msec per loop As for more realistic cases, base85 encoding is 5% faster with multi-digit free list. $ ./python -m timeit -s "from base64 import b85encode; a = bytes(range(256))*100" -- "b85encode(a)" Unpatched: 100 loops, best of 3: 10 msec per loop 1-digit free list: 100 loops, best of 3: 9.85 msec per loop Multi-digit free list: 100 loops, best of 3: 9.48 msec per loop |
|||
msg260128 - (view) | Author: Yury Selivanov (yselivanov) * ![]() |
Date: 2016-02-11 19:38 | |
I think that we only need to add free-list for 1-digit longs. Please see my patch & explanation in issue #26341. |
|||
msg260129 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2016-02-11 19:57 | |
Did you test on platform with 30-bit digits? I tested with 15-bit digits. Could you repeat my microbenchmarks from msg242919? |
|||
msg260130 - (view) | Author: Yury Selivanov (yselivanov) * ![]() |
Date: 2016-02-11 19:59 | |
> Did you test on platform with 30-bit digits? Yes. > Could you repeat my microbenchmarks from msg242919? Sure. With your patches or with mine from issue #26341? |
|||
msg260131 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2016-02-11 20:09 | |
With all three patches if you want (I don't expect a difference between your patch and my single-digit patch). |
|||
msg260133 - (view) | Author: Yury Selivanov (yselivanov) * ![]() |
Date: 2016-02-11 21:29 | |
Best of 5s: -m timeit -s "r = range(10**4)" -- "for i in r: pass" orig: 239 usec my patch: 148 int_free_list_2: 151 int_free_list_multi: 156 -m timeit -s "r = range(10**5)" -- "for i in r: pass" orig: 2.4 msec my patch: 1.47 int_free_list_2: 1.53 int_free_list_multi: 1.57 -m timeit -s "a = list(range(10**4))" -- "for i, x in enumerate(a): pass" orig: 416 usec my: 314 int_free_list_2: 314 int_free_list_multi: 317 -m timeit -s "a = list(range(10**5))" -- "for i, x in enumerate(a): pass" orig: 4.1 msec my: 3.13 int_free_list_2: 3.14 int_free_list_multi: 3.13 -m timeit -s "from base64 import b85encode; a = bytes(range(256))*100" -- "b85encode(a)" orig: 3.49 msec my: 3.28 int_free_list_2: 3.30 int_free_list_multi: 3.31 -m timeit -s "loops=tuple(range(1000))" "for x in loops: x+x" orig: 44.4 usec my: 35.2 int_free_list_2: 35.4 int_free_list_multi: 35.5 spectral_norm (against default): my: 1.12x faster int_free_list_2: 1.12x faster int_free_list_multi: 1.12x faster ========== All in all, all patches show the same performance improvement. I guess we can go with int_free_list_multi. |
|||
msg260148 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2016-02-11 23:48 | |
I ran perf.py on long_fl.patch of issue #26341. It looks slower and has no impact on such macro benchmark. ~/bin/taskset_isolated.py time python3 -u perf.py --rigorous ../default/python.orig ../default/python_long_fl # python rev 37bacf3fa1f5 Report on Linux smithers 4.3.4-300.fc23.x86_64 #1 SMP Mon Jan 25 13:39:23 UTC 2016 x86_64 x86_64 Total CPU cores: 8 ### chameleon_v2 ### Min: 5.660445 -> 5.809548: 1.03x slower Avg: 5.707313 -> 5.851431: 1.03x slower Significant (t=-31.76) Stddev: 0.03655 -> 0.02690: 1.3585x smaller ### json_dump_v2 ### Min: 2.745682 -> 2.819627: 1.03x slower Avg: 2.769530 -> 2.838116: 1.02x slower Significant (t=-42.78) Stddev: 0.01019 -> 0.01238: 1.2147x larger ### regex_v8 ### Min: 0.041680 -> 0.041081: 1.01x faster Avg: 0.042383 -> 0.041265: 1.03x faster Significant (t=6.49) Stddev: 0.00122 -> 0.00121: 1.0077x smaller The following not significant results are hidden, use -v to show them: 2to3, django_v3, fastpickle, fastunpickle, json_load, nbody, tornado_http. |
|||
msg260149 - (view) | Author: Yury Selivanov (yselivanov) * ![]() |
Date: 2016-02-11 23:58 | |
I also ran benchmarks. For me, django was 1% faster, telco 5% slower, and the rest were the same. telco is a decimal benchmarks (ints aren't used there), and django/chameleon are unicode concatenation benchmarks. I can see improvements in micro benchmarks, but even more importantly, Serhiy's patch reduces memory fragmentations. 99% of all long allocations are coming from freelist when it's there. |
|||
msg260167 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2016-02-12 09:53 | |
I ran again the benchmark on long_fl.patch of issue #26341 with -b all. The problem is that I don't know what to think about the benchmark, to me all these number only look like noise :-/ If we ignore changes smaller than 1.05 (positive or negative), the patch has no impact on performance on such macro benchmark. I didn't say that the patches are useless :-) We may focus on micro-benchmark? $ ~/bin/taskset_isolated.py time python3 -u perf.py --rigorous ../default/python.orig ../default/python_long_fl -b all Report on Linux smithers 4.3.4-300.fc23.x86_64 #1 SMP Mon Jan 25 13:39:23 UTC 2016 x86_64 x86_64 Total CPU cores: 8 ### call_method ### Min: 0.316851 -> 0.308606: 1.03x faster Avg: 0.317870 -> 0.309778: 1.03x faster Significant (t=480.37) Stddev: 0.00014 -> 0.00026: 1.8165x larger ### etree_parse ### Min: 0.266148 -> 0.255969: 1.04x faster Avg: 0.267591 -> 0.257492: 1.04x faster Significant (t=67.72) Stddev: 0.00108 -> 0.00103: 1.0478x smaller ### etree_process ### Min: 0.218512 -> 0.225462: 1.03x slower Avg: 0.220441 -> 0.227143: 1.03x slower Significant (t=-37.15) Stddev: 0.00128 -> 0.00127: 1.0035x smaller ### fannkuch ### Min: 0.962323 -> 0.984226: 1.02x slower Avg: 0.965782 -> 0.985413: 1.02x slower Significant (t=-73.63) Stddev: 0.00213 -> 0.00160: 1.3276x smaller ### float ### Min: 0.252470 -> 0.257536: 1.02x slower Avg: 0.259895 -> 0.265731: 1.02x slower Significant (t=-9.15) Stddev: 0.00426 -> 0.00474: 1.1125x larger ### json_dump_v2 ### Min: 2.717022 -> 2.814488: 1.04x slower Avg: 2.743981 -> 2.835444: 1.03x slower Significant (t=-46.41) Stddev: 0.01375 -> 0.01411: 1.0264x larger ### mako_v2 ### Min: 0.039410 -> 0.037304: 1.06x faster Avg: 0.040038 -> 0.038094: 1.05x faster Significant (t=138.56) Stddev: 0.00024 -> 0.00037: 1.5234x larger ### meteor_contest ### Min: 0.182787 -> 0.191944: 1.05x slower Avg: 0.183526 -> 0.193532: 1.05x slower Significant (t=-147.53) Stddev: 0.00031 -> 0.00060: 1.9114x larger ### nbody ### Min: 0.232746 -> 0.221279: 1.05x faster Avg: 0.233580 -> 0.222623: 1.05x faster Significant (t=67.66) Stddev: 0.00052 -> 0.00153: 2.9467x larger ### nqueens ### Min: 0.254579 -> 0.263282: 1.03x slower Avg: 0.256874 -> 0.264082: 1.03x slower Significant (t=-57.86) Stddev: 0.00110 -> 0.00059: 1.8689x smaller ### pickle_dict ### Min: 0.502160 -> 0.490473: 1.02x faster Avg: 0.502456 -> 0.490759: 1.02x faster Significant (t=654.42) Stddev: 0.00014 -> 0.00011: 1.1950x smaller ### raytrace ### Min: 1.271059 -> 1.309407: 1.03x slower Avg: 1.274115 -> 1.313171: 1.03x slower Significant (t=-206.50) Stddev: 0.00123 -> 0.00144: 1.1698x larger ### richards ### Min: 0.162761 -> 0.158441: 1.03x faster Avg: 0.164611 -> 0.160229: 1.03x faster Significant (t=30.03) Stddev: 0.00107 -> 0.00099: 1.0761x smaller ### simple_logging ### Min: 0.279392 -> 0.286003: 1.02x slower Avg: 0.280746 -> 0.287228: 1.02x slower Significant (t=-59.16) Stddev: 0.00075 -> 0.00080: 1.0760x larger ### telco ### Min: 0.012419 -> 0.011853: 1.05x faster Avg: 0.012500 -> 0.011968: 1.04x faster Significant (t=93.79) Stddev: 0.00003 -> 0.00005: 1.3307x larger The following not significant results are hidden, use -v to show them: 2to3, call_method_slots, call_method_unknown, call_simple, chameleon_v2, chaos, django_v3, etree_generate, etree_iterparse, fastpickle, fastunpickle, formatted_logging, go, hexiom2, json_load, normal_startup, pathlib, pickle_list, pidigits, regex_compile, regex_effbot, regex_v8, silent_logging, spectral_norm, startup_nosite, tornado_http, unpack_sequence, unpickle_list. |
|||
msg260171 - (view) | Author: Stefan Behnel (scoder) * ![]() |
Date: 2016-02-12 10:34 | |
I like Serhiy's patch, too, but it feels like the single-digit case should be enough. I found this comment by Yury a good argument: """ I can see improvements in micro benchmarks, but even more importantly, Serhiy's patch reduces memory fragmentations. 99% of all long allocations are coming from freelist when it's there. """ Did that comment come from a benchmark suite run? (i.e. actual applications and not micro benchmarks?) And, does it show a difference between the single- and multi-digit cases? |
|||
msg260178 - (view) | Author: Yury Selivanov (yselivanov) * ![]() |
Date: 2016-02-12 13:59 | |
> Did that comment come from a benchmark suite run? (i.e. actual applications and not micro benchmarks?) And, does it show a difference between the single- and multi-digit cases? Yes, more details here: http://bugs.python.org/issue26341#msg260124 |
|||
msg260267 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2016-02-14 09:41 | |
> 99% of all long allocations are coming from freelist when it's there. Detailed statistics from a test suite run see in msg242886. Only a half of ints are single-digit with 15-bit digits, and 3/4 with 30-bit digits. 86% of ints are 32-bit. The majority of ints (about 2/3) are small ints in the range [-5..256]. These patches don't affect them. That is why the effect of patches is not very significant. |
|||
msg260268 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2016-02-14 10:12 | |
The test suite can't really be representative of common workloads and it isn't meant to be. The real question is not so much if the freelist helps reduce the number of integer allocations (it's obvious it will), it's whether doing so actually speeds up Python significantly. The small object allocator is quite fast. If freelisting one-digit integers doesn't bring any tangible benefits, it's unlikely that freelisting two-digit integers will. The general distribution of integers probably follows some kind of power law (which is why small integers are interned). And since most installs are probably 64-bit nowadays, single-digit integers go up to 2**30, which covers the immense majority of uses. |
|||
msg263410 - (view) | Author: Larry Hastings (larry) * ![]() |
Date: 2016-04-14 14:47 | |
FWIW, the patch still cleanly applies, but now a couple tests in posix fail because the assertion text has changed. |
|||
msg283687 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2016-12-20 13:03 | |
Performance version: 0.5.0 Python version: 3.7.0a0 (64-bit) revision 31df7d9863f3+ Report on Linux-4.8.0-30-generic-x86_64-with-debian-stretch-sid Slower (13): - nbody: 232 ms +- 3 ms -> 241 ms +- 6 ms: 1.04x slower - unpack_sequence: 118 ns +- 3 ns -> 121 ns +- 0 ns: 1.03x slower - call_method_slots: 14.7 ms +- 0.1 ms -> 15.1 ms +- 0.4 ms: 1.02x slower - logging_silent: 724 ns +- 15 ns -> 740 ns +- 8 ns: 1.02x slower - telco: 22.5 ms +- 0.5 ms -> 22.9 ms +- 0.5 ms: 1.02x slower - sqlite_synth: 9.69 us +- 0.27 us -> 9.85 us +- 0.20 us: 1.02x slower - pickle_list: 8.45 us +- 0.11 us -> 8.57 us +- 0.16 us: 1.01x slower - pickle_dict: 61.5 us +- 0.5 us -> 62.1 us +- 4.1 us: 1.01x slower - call_method: 15.2 ms +- 0.1 ms -> 15.3 ms +- 0.1 ms: 1.01x slower - python_startup_no_site: 9.45 ms +- 0.02 ms -> 9.50 ms +- 0.02 ms: 1.00x slower - call_method_unknown: 17.2 ms +- 0.2 ms -> 17.2 ms +- 0.2 ms: 1.00x slower - meteor_contest: 197 ms +- 2 ms -> 198 ms +- 2 ms: 1.00x slower - python_startup: 15.7 ms +- 0.0 ms -> 15.7 ms +- 0.0 ms: 1.00x slower Faster (35): - spectral_norm: 284 ms +- 7 ms -> 262 ms +- 10 ms: 1.08x faster - scimark_sparse_mat_mult: 8.62 ms +- 0.30 ms -> 7.99 ms +- 0.22 ms: 1.08x faster - mako: 45.5 ms +- 0.3 ms -> 43.4 ms +- 0.6 ms: 1.05x faster - scimark_fft: 691 ms +- 13 ms -> 660 ms +- 13 ms: 1.05x faster - chameleon: 30.5 ms +- 0.3 ms -> 29.4 ms +- 0.5 ms: 1.04x faster - scimark_sor: 491 ms +- 9 ms -> 474 ms +- 8 ms: 1.04x faster - fannkuch: 1.07 sec +- 0.03 sec -> 1.04 sec +- 0.01 sec: 1.04x faster - crypto_pyaes: 229 ms +- 2 ms -> 222 ms +- 4 ms: 1.03x faster - hexiom: 23.5 ms +- 0.1 ms -> 22.8 ms +- 0.2 ms: 1.03x faster - regex_compile: 440 ms +- 5 ms -> 430 ms +- 3 ms: 1.03x faster - pickle: 24.3 us +- 0.5 us -> 23.7 us +- 0.5 us: 1.02x faster - unpickle: 31.6 us +- 0.3 us -> 30.9 us +- 0.3 us: 1.02x faster - xml_etree_generate: 291 ms +- 5 ms -> 284 ms +- 7 ms: 1.02x faster - xml_etree_process: 249 ms +- 3 ms -> 243 ms +- 4 ms: 1.02x faster - json_loads: 62.6 us +- 0.8 us -> 61.2 us +- 1.1 us: 1.02x faster - xml_etree_iterparse: 223 ms +- 6 ms -> 218 ms +- 5 ms: 1.02x faster - scimark_monte_carlo: 263 ms +- 8 ms -> 257 ms +- 9 ms: 1.02x faster - raytrace: 1.31 sec +- 0.01 sec -> 1.28 sec +- 0.01 sec: 1.02x faster - pickle_pure_python: 1.31 ms +- 0.01 ms -> 1.29 ms +- 0.02 ms: 1.02x faster - unpickle_pure_python: 923 us +- 15 us -> 906 us +- 32 us: 1.02x faster - chaos: 298 ms +- 2 ms -> 294 ms +- 2 ms: 1.01x faster - sympy_sum: 207 ms +- 6 ms -> 204 ms +- 6 ms: 1.01x faster - call_simple: 14.0 ms +- 0.3 ms -> 13.9 ms +- 0.3 ms: 1.01x faster - regex_v8: 46.0 ms +- 2.1 ms -> 45.5 ms +- 0.7 ms: 1.01x faster - genshi_text: 88.5 ms +- 0.9 ms -> 87.4 ms +- 1.3 ms: 1.01x faster - sympy_expand: 1.03 sec +- 0.01 sec -> 1.02 sec +- 0.01 sec: 1.01x faster - 2to3: 737 ms +- 3 ms -> 730 ms +- 3 ms: 1.01x faster - sympy_str: 462 ms +- 4 ms -> 458 ms +- 6 ms: 1.01x faster - unpickle_list: 7.67 us +- 0.32 us -> 7.60 us +- 0.11 us: 1.01x faster - go: 593 ms +- 3 ms -> 589 ms +- 5 ms: 1.01x faster - dulwich_log: 153 ms +- 1 ms -> 152 ms +- 1 ms: 1.01x faster - sqlalchemy_declarative: 311 ms +- 3 ms -> 309 ms +- 3 ms: 1.01x faster - pathlib: 50.3 ms +- 1.4 ms -> 50.0 ms +- 0.6 ms: 1.01x faster - django_template: 398 ms +- 3 ms -> 396 ms +- 5 ms: 1.01x faster - pidigits: 310 ms +- 0 ms -> 308 ms +- 0 ms: 1.00x faster Benchmark hidden because not significant (16): deltablue, float, genshi_xml, html5lib, json_dumps, logging_format, logging_simple, nqueens, regex_dna, regex_effbot, richards, scimark_lu, sqlalchemy_imperative, sympy_integrate, tornado_http, xml_etree_parse |
|||
msg379288 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2020-10-22 10:47 | |
I updated the patch. I can not run pyperformance for now, because: AssertionError: would build wheel with unsupported tag ('cp310', 'cp310', 'linux_x86_64' I added this config, but it can not solve the problem: ``` $ cat ~/.config/pip/pip.conf [global] no-cache-dir = true ``` |
|||
msg379311 - (view) | Author: Pablo Galindo Salgado (pablogsal) * ![]() |
Date: 2020-10-22 16:17 | |
Inada-san, you can run pyperformance with this workaround: python -m pip install pyperformance==1.0.0 We are fixing the error soon after https://discuss.python.org/t/pep-641-using-an-underscore-in-the-version-portion-of-python-3-10-compatibility-tags/5513 lands |
|||
msg379395 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2020-10-23 02:27 | |
I heard pyperformance 1.0.0 works and here is the result of PR-22884. $ ./python-master -m pyperf compare_to master.json patched.json -G --min-speed=1 Slower (8): - pathlib: 26.3 ms +- 0.3 ms -> 26.8 ms +- 0.4 ms: 1.02x slower (+2%) - chameleon: 12.8 ms +- 0.1 ms -> 13.0 ms +- 0.1 ms: 1.02x slower (+2%) - genshi_text: 38.3 ms +- 0.7 ms -> 38.9 ms +- 0.6 ms: 1.02x slower (+2%) - sqlalchemy_imperative: 40.4 ms +- 0.9 ms -> 41.0 ms +- 0.8 ms: 1.02x slower (+2%) - sympy_str: 441 ms +- 4 ms -> 448 ms +- 4 ms: 1.01x slower (+1%) - chaos: 146 ms +- 1 ms -> 148 ms +- 2 ms: 1.01x slower (+1%) - unpickle: 18.7 us +- 0.1 us -> 18.9 us +- 0.2 us: 1.01x slower (+1%) - xml_etree_parse: 177 ms +- 2 ms -> 179 ms +- 3 ms: 1.01x slower (+1%) Faster (11): - scimark_sparse_mat_mult: 6.74 ms +- 0.18 ms -> 6.26 ms +- 0.03 ms: 1.08x faster (-7%) - scimark_fft: 511 ms +- 7 ms -> 496 ms +- 4 ms: 1.03x faster (-3%) - spectral_norm: 181 ms +- 2 ms -> 176 ms +- 3 ms: 1.03x faster (-3%) - pidigits: 225 ms +- 1 ms -> 219 ms +- 1 ms: 1.03x faster (-3%) - pickle_dict: 35.5 us +- 1.3 us -> 34.8 us +- 0.3 us: 1.02x faster (-2%) - pickle_list: 5.32 us +- 0.09 us -> 5.23 us +- 0.09 us: 1.02x faster (-2%) - pyflate: 883 ms +- 7 ms -> 867 ms +- 6 ms: 1.02x faster (-2%) - scimark_sor: 264 ms +- 2 ms -> 259 ms +- 2 ms: 1.02x faster (-2%) - sqlite_synth: 4.04 us +- 0.10 us -> 3.98 us +- 0.09 us: 1.02x faster (-1%) - regex_dna: 243 ms +- 3 ms -> 240 ms +- 1 ms: 1.01x faster (-1%) - crypto_pyaes: 165 ms +- 3 ms -> 163 ms +- 1 ms: 1.01x faster (-1%) Benchmark hidden because not significant (41) |
|||
msg379396 - (view) | Author: Yury Selivanov (yselivanov) * ![]() |
Date: 2020-10-23 02:33 | |
Inada-san, how do you interpret the results? Looks like it's performance-neutral. |
|||
msg379399 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2020-10-23 03:21 | |
I had suspected that pypeformance just don't have enough workload for non-small int. For example, spectral_norm is integer heavy + some float warkload. But bm_spectral_norm uses `DEFAULT_N = 130`. So most integers are fit into smallint cache. On the othar hand, spectral_norm in the benchmarkgame uses N=5500. https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/spectralnorm-python3-8.html So I ran the benchmark on my machine: master: real 1m24.647s user 5m37.515s patched: real 1m19.033s user 5m14.682s master+increased small int from [-5, 256] to [-9, 1024] real 1m23.742s user 5m33.569s 314.682/337.515 = 0.9323496733478512. So ther is only 7% speedup even when N=5500. After all, I think it is doubtful. Let's stop this idea until situation is changed. |
|||
msg379521 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2020-10-24 06:39 | |
I close this issue for now. Please reopen or create a new issue if you came up with better idea. |
|||
msg379526 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2020-10-24 09:42 | |
I agree that it is not worth to add this optimization. |
|||
msg380134 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2020-11-01 13:33 | |
> Inada-san, how do you interpret the results? Looks like it's performance-neutral. You should give a try to the development branch of pyperf which computes the geometric mean of all results and says if it's faster or slower overall :-D https://mail.python.org/archives/list/speed@python.org/thread/RANN6PQURUVPMNXS6GIOL42F2DIFV5LM/ (I'm still waiting for testers before releasing a new version including the new feature.) |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:16 | admin | set | github: 68353 |
2021-09-18 22:09:17 | gvanrossum | set | nosy:
+ gvanrossum |
2020-11-01 13:33:58 | vstinner | set | messages: + msg380134 |
2020-10-24 09:42:11 | serhiy.storchaka | set | messages: + msg379526 |
2020-10-24 06:39:38 | methane | set | status: open -> closed versions: + Python 3.10, - Python 3.7 messages: + msg379521 resolution: rejected stage: patch review -> resolved |
2020-10-23 03:21:19 | methane | set | messages: + msg379399 |
2020-10-23 02:33:38 | yselivanov | set | messages: + msg379396 |
2020-10-23 02:30:49 | yselivanov | set | nosy:
+ pablogsal |
2020-10-23 02:27:04 | methane | set | nosy:
- pablogsal messages: + msg379395 |
2020-10-22 16:17:04 | pablogsal | set | nosy:
+ pablogsal messages: + msg379311 |
2020-10-22 10:47:43 | methane | set | messages: + msg379288 |
2020-10-22 10:44:59 | methane | set | pull_requests: + pull_request21823 |
2020-05-29 17:46:31 | brett.cannon | set | nosy:
- brett.cannon |
2016-12-20 13:03:36 | methane | set | messages: + msg283687 |
2016-12-19 13:11:30 | methane | set | nosy:
+ methane versions: + Python 3.7, - Python 3.6 |
2016-04-14 14:47:54 | larry | set | messages: + msg263410 |
2016-02-14 10:12:07 | pitrou | set | messages: + msg260268 |
2016-02-14 09:41:24 | serhiy.storchaka | set | messages: + msg260267 |
2016-02-12 13:59:25 | yselivanov | set | messages: + msg260178 |
2016-02-12 10:34:57 | scoder | set | messages:
+ msg260171 versions: + Python 3.6, - Python 3.5 |
2016-02-12 09:54:01 | vstinner | set | messages: + msg260167 |
2016-02-11 23:58:44 | yselivanov | set | messages: + msg260149 |
2016-02-11 23:48:25 | vstinner | set | messages: + msg260148 |
2016-02-11 21:29:37 | yselivanov | set | messages: + msg260133 |
2016-02-11 20:09:28 | serhiy.storchaka | set | messages: + msg260131 |
2016-02-11 19:59:31 | yselivanov | set | messages: + msg260130 |
2016-02-11 19:57:51 | BreamoreBoy | set | nosy:
- BreamoreBoy |
2016-02-11 19:57:32 | serhiy.storchaka | set | messages: + msg260129 |
2016-02-11 19:38:16 | yselivanov | set | nosy:
+ yselivanov messages: + msg260128 |
2016-02-11 19:37:31 | yselivanov | link | issue26341 superseder |
2015-09-22 10:31:20 | vstinner | set | nosy:
+ vstinner |
2015-07-21 07:15:25 | ethan.furman | set | nosy:
- ethan.furman |
2015-05-11 21:43:57 | serhiy.storchaka | set | files:
+ int_free_list_multidigit.patch messages: + msg242919 |
2015-05-11 20:22:10 | pitrou | set | messages: + msg242915 |
2015-05-11 20:06:25 | scoder | set | messages: + msg242913 |
2015-05-11 19:59:06 | pitrou | set | messages: + msg242911 |
2015-05-11 19:57:07 | scoder | set | messages: + msg242910 |
2015-05-11 18:25:03 | serhiy.storchaka | set | messages: + msg242907 |
2015-05-11 14:16:43 | brett.cannon | set | nosy:
+ brett.cannon messages: + msg242896 |
2015-05-11 13:59:29 | serhiy.storchaka | create |