Message336752
As for depending the optimization on the size of CPU cache, I have repeated mickrobenchmarks on the computer with 6 MiB cache and two computers with 512 KiB caches (64- and 32-bit).
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (cache size: 6144 KB):
+---------------------------------+----------+------------------------------+
| Benchmark | baseline | inline |
+=================================+==========+==============================+
| round_(4.2) | 113 ns | 81.3 ns: 1.39x faster (-28%) |
+---------------------------------+----------+------------------------------+
| sum_(()) | 83.8 ns | 56.7 ns: 1.48x faster (-32%) |
+---------------------------------+----------+------------------------------+
| sum_(a) | 98.0 ns | 72.1 ns: 1.36x faster (-26%) |
+---------------------------------+----------+------------------------------+
| 'abc'.split() | 107 ns | 83.1 ns: 1.29x faster (-22%) |
+---------------------------------+----------+------------------------------+
| b'abc'.split() | 101 ns | 75.4 ns: 1.34x faster (-25%) |
+---------------------------------+----------+------------------------------+
| 'abc'.split('-') | 123 ns | 89.9 ns: 1.37x faster (-27%) |
+---------------------------------+----------+------------------------------+
| 'abc'.encode() | 79.6 ns | 59.2 ns: 1.34x faster (-26%) |
+---------------------------------+----------+------------------------------+
| b'abc'.decode() | 105 ns | 84.7 ns: 1.24x faster (-20%) |
+---------------------------------+----------+------------------------------+
| int_(4.2) | 88.9 ns | 64.1 ns: 1.39x faster (-28%) |
+---------------------------------+----------+------------------------------+
| int_('5') | 137 ns | 108 ns: 1.28x faster (-22%) |
+---------------------------------+----------+------------------------------+
| 42 .to_bytes(2, 'little') | 113 ns | 77.6 ns: 1.45x faster (-31%) |
+---------------------------------+----------+------------------------------+
| int_from_bytes(b'ab', 'little') | 83.4 ns | 51.5 ns: 1.62x faster (-38%) |
+---------------------------------+----------+------------------------------+
| struct_i32_unpack_from(b'abcd') | 96.0 ns | 71.6 ns: 1.34x faster (-25%) |
+---------------------------------+----------+------------------------------+
| re_word_match('a') | 221 ns | 180 ns: 1.22x faster (-18%) |
+---------------------------------+----------+------------------------------+
| datetime_now() | 282 ns | 248 ns: 1.14x faster (-12%) |
+---------------------------------+----------+------------------------------+
Not significant (1): zlib_compress(b'abc')
AMD Athlon(tm) 64 X2 Dual Core Processor 4600+ (cache size: 512 KB):
+---------------------------------+----------+-----------------------------+
| Benchmark | baseline | inline |
+=================================+==========+=============================+
| round_(4.2) | 391 ns | 272 ns: 1.44x faster (-31%) |
+---------------------------------+----------+-----------------------------+
| sum_(()) | 212 ns | 160 ns: 1.32x faster (-24%) |
+---------------------------------+----------+-----------------------------+
| sum_(a) | 256 ns | 211 ns: 1.21x faster (-18%) |
+---------------------------------+----------+-----------------------------+
| 'abc'.split() | 290 ns | 233 ns: 1.25x faster (-20%) |
+---------------------------------+----------+-----------------------------+
| b'abc'.split() | 263 ns | 226 ns: 1.16x faster (-14%) |
+---------------------------------+----------+-----------------------------+
| 'abc'.split('-') | 316 ns | 262 ns: 1.21x faster (-17%) |
+---------------------------------+----------+-----------------------------+
| 'abc'.encode() | 197 ns | 154 ns: 1.28x faster (-22%) |
+---------------------------------+----------+-----------------------------+
| b'abc'.decode() | 303 ns | 250 ns: 1.21x faster (-18%) |
+---------------------------------+----------+-----------------------------+
| int_(4.2) | 234 ns | 171 ns: 1.37x faster (-27%) |
+---------------------------------+----------+-----------------------------+
| int_('5') | 372 ns | 310 ns: 1.20x faster (-17%) |
+---------------------------------+----------+-----------------------------+
| 42 .to_bytes(2, 'little') | 370 ns | 245 ns: 1.51x faster (-34%) |
+---------------------------------+----------+-----------------------------+
| int_from_bytes(b'ab', 'little') | 251 ns | 167 ns: 1.50x faster (-33%) |
+---------------------------------+----------+-----------------------------+
| struct_i32_unpack_from(b'abcd') | 252 ns | 202 ns: 1.24x faster (-20%) |
+---------------------------------+----------+-----------------------------+
| re_word_match('a') | 625 ns | 524 ns: 1.19x faster (-16%) |
+---------------------------------+----------+-----------------------------+
| datetime_now() | 2.05 us | 1.99 us: 1.03x faster (-3%) |
+---------------------------------+----------+-----------------------------+
| zlib_compress(b'abc') | 28.6 us | 28.0 us: 1.02x faster (-2%) |
+---------------------------------+----------+-----------------------------+
Intel(R) Atom(TM) CPU N570 @ 1.66GHz (cache size: 512 KB), 32-bit:
+---------------------------------+----------+------------------------------+
| Benchmark | baseline | inline |
+=================================+==========+==============================+
| round_(4.2) | 1.95 us | 1.29 us: 1.51x faster (-34%) |
+---------------------------------+----------+------------------------------+
| sum_(()) | 1.15 us | 821 ns: 1.40x faster (-29%) |
+---------------------------------+----------+------------------------------+
| sum_(a) | 1.32 us | 1.02 us: 1.30x faster (-23%) |
+---------------------------------+----------+------------------------------+
| 'abc'.split() | 1.32 us | 1.11 us: 1.19x faster (-16%) |
+---------------------------------+----------+------------------------------+
| b'abc'.split() | 1.22 us | 1.03 us: 1.18x faster (-15%) |
+---------------------------------+----------+------------------------------+
| 'abc'.split('-') | 1.78 us | 1.15 us: 1.54x faster (-35%) |
+---------------------------------+----------+------------------------------+
| 'abc'.encode() | 1.05 us | 883 ns: 1.19x faster (-16%) |
+---------------------------------+----------+------------------------------+
| b'abc'.decode() | 1.34 us | 1.17 us: 1.15x faster (-13%) |
+---------------------------------+----------+------------------------------+
| int_(4.2) | 1.23 us | 859 ns: 1.43x faster (-30%) |
+---------------------------------+----------+------------------------------+
| int_('5') | 2.20 us | 1.41 us: 1.56x faster (-36%) |
+---------------------------------+----------+------------------------------+
| 42 .to_bytes(2, 'little') | 1.45 us | 1.09 us: 1.33x faster (-25%) |
+---------------------------------+----------+------------------------------+
| int_from_bytes(b'ab', 'little') | 1.07 us | 737 ns: 1.45x faster (-31%) |
+---------------------------------+----------+------------------------------+
| struct_i32_unpack_from(b'abcd') | 1.31 us | 1.08 us: 1.21x faster (-18%) |
+---------------------------------+----------+------------------------------+
| re_word_match('a') | 2.85 us | 2.06 us: 1.39x faster (-28%) |
+---------------------------------+----------+------------------------------+
| datetime_now() | 6.20 us | 5.92 us: 1.05x faster (-4%) |
+---------------------------------+----------+------------------------------+
| zlib_compress(b'abc') | 28.7 us | 26.9 us: 1.07x faster (-6%) |
+---------------------------------+----------+------------------------------+
The speed up is significant on all computers. |
|
Date |
User |
Action |
Args |
2019-02-27 13:00:34 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, vstinner, larry, josh.r, ammar2, xtreak |
2019-02-27 13:00:34 | serhiy.storchaka | set | messageid: <1551272434.07.0.338437739622.issue36127@roundup.psfhosted.org> |
2019-02-27 13:00:34 | serhiy.storchaka | link | issue36127 messages |
2019-02-27 13:00:33 | serhiy.storchaka | create | |
|