Author serhiy.storchaka
Recipients ammar2, josh.r, larry, serhiy.storchaka, vstinner, xtreak
Date 2019-02-27.13:00:33
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1551272434.07.0.338437739622.issue36127@roundup.psfhosted.org>
In-reply-to
Content
As for depending the optimization on the size of CPU cache, I have repeated mickrobenchmarks on the computer with 6 MiB cache and two computers with 512 KiB caches (64- and 32-bit).

Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (cache size: 6144 KB):

+---------------------------------+----------+------------------------------+
| Benchmark                       | baseline | inline                       |
+=================================+==========+==============================+
| round_(4.2)                     | 113 ns   | 81.3 ns: 1.39x faster (-28%) |
+---------------------------------+----------+------------------------------+
| sum_(())                        | 83.8 ns  | 56.7 ns: 1.48x faster (-32%) |
+---------------------------------+----------+------------------------------+
| sum_(a)                         | 98.0 ns  | 72.1 ns: 1.36x faster (-26%) |
+---------------------------------+----------+------------------------------+
| 'abc'.split()                   | 107 ns   | 83.1 ns: 1.29x faster (-22%) |
+---------------------------------+----------+------------------------------+
| b'abc'.split()                  | 101 ns   | 75.4 ns: 1.34x faster (-25%) |
+---------------------------------+----------+------------------------------+
| 'abc'.split('-')                | 123 ns   | 89.9 ns: 1.37x faster (-27%) |
+---------------------------------+----------+------------------------------+
| 'abc'.encode()                  | 79.6 ns  | 59.2 ns: 1.34x faster (-26%) |
+---------------------------------+----------+------------------------------+
| b'abc'.decode()                 | 105 ns   | 84.7 ns: 1.24x faster (-20%) |
+---------------------------------+----------+------------------------------+
| int_(4.2)                       | 88.9 ns  | 64.1 ns: 1.39x faster (-28%) |
+---------------------------------+----------+------------------------------+
| int_('5')                       | 137 ns   | 108 ns: 1.28x faster (-22%)  |
+---------------------------------+----------+------------------------------+
| 42 .to_bytes(2, 'little')       | 113 ns   | 77.6 ns: 1.45x faster (-31%) |
+---------------------------------+----------+------------------------------+
| int_from_bytes(b'ab', 'little') | 83.4 ns  | 51.5 ns: 1.62x faster (-38%) |
+---------------------------------+----------+------------------------------+
| struct_i32_unpack_from(b'abcd') | 96.0 ns  | 71.6 ns: 1.34x faster (-25%) |
+---------------------------------+----------+------------------------------+
| re_word_match('a')              | 221 ns   | 180 ns: 1.22x faster (-18%)  |
+---------------------------------+----------+------------------------------+
| datetime_now()                  | 282 ns   | 248 ns: 1.14x faster (-12%)  |
+---------------------------------+----------+------------------------------+

Not significant (1): zlib_compress(b'abc')

AMD Athlon(tm) 64 X2 Dual Core Processor 4600+ (cache size: 512 KB):

+---------------------------------+----------+-----------------------------+
| Benchmark                       | baseline | inline                      |
+=================================+==========+=============================+
| round_(4.2)                     | 391 ns   | 272 ns: 1.44x faster (-31%) |
+---------------------------------+----------+-----------------------------+
| sum_(())                        | 212 ns   | 160 ns: 1.32x faster (-24%) |
+---------------------------------+----------+-----------------------------+
| sum_(a)                         | 256 ns   | 211 ns: 1.21x faster (-18%) |
+---------------------------------+----------+-----------------------------+
| 'abc'.split()                   | 290 ns   | 233 ns: 1.25x faster (-20%) |
+---------------------------------+----------+-----------------------------+
| b'abc'.split()                  | 263 ns   | 226 ns: 1.16x faster (-14%) |
+---------------------------------+----------+-----------------------------+
| 'abc'.split('-')                | 316 ns   | 262 ns: 1.21x faster (-17%) |
+---------------------------------+----------+-----------------------------+
| 'abc'.encode()                  | 197 ns   | 154 ns: 1.28x faster (-22%) |
+---------------------------------+----------+-----------------------------+
| b'abc'.decode()                 | 303 ns   | 250 ns: 1.21x faster (-18%) |
+---------------------------------+----------+-----------------------------+
| int_(4.2)                       | 234 ns   | 171 ns: 1.37x faster (-27%) |
+---------------------------------+----------+-----------------------------+
| int_('5')                       | 372 ns   | 310 ns: 1.20x faster (-17%) |
+---------------------------------+----------+-----------------------------+
| 42 .to_bytes(2, 'little')       | 370 ns   | 245 ns: 1.51x faster (-34%) |
+---------------------------------+----------+-----------------------------+
| int_from_bytes(b'ab', 'little') | 251 ns   | 167 ns: 1.50x faster (-33%) |
+---------------------------------+----------+-----------------------------+
| struct_i32_unpack_from(b'abcd') | 252 ns   | 202 ns: 1.24x faster (-20%) |
+---------------------------------+----------+-----------------------------+
| re_word_match('a')              | 625 ns   | 524 ns: 1.19x faster (-16%) |
+---------------------------------+----------+-----------------------------+
| datetime_now()                  | 2.05 us  | 1.99 us: 1.03x faster (-3%) |
+---------------------------------+----------+-----------------------------+
| zlib_compress(b'abc')           | 28.6 us  | 28.0 us: 1.02x faster (-2%) |
+---------------------------------+----------+-----------------------------+

Intel(R) Atom(TM) CPU N570   @ 1.66GHz (cache size: 512 KB), 32-bit:

+---------------------------------+----------+------------------------------+
| Benchmark                       | baseline | inline                       |
+=================================+==========+==============================+
| round_(4.2)                     | 1.95 us  | 1.29 us: 1.51x faster (-34%) |
+---------------------------------+----------+------------------------------+
| sum_(())                        | 1.15 us  | 821 ns: 1.40x faster (-29%)  |
+---------------------------------+----------+------------------------------+
| sum_(a)                         | 1.32 us  | 1.02 us: 1.30x faster (-23%) |
+---------------------------------+----------+------------------------------+
| 'abc'.split()                   | 1.32 us  | 1.11 us: 1.19x faster (-16%) |
+---------------------------------+----------+------------------------------+
| b'abc'.split()                  | 1.22 us  | 1.03 us: 1.18x faster (-15%) |
+---------------------------------+----------+------------------------------+
| 'abc'.split('-')                | 1.78 us  | 1.15 us: 1.54x faster (-35%) |
+---------------------------------+----------+------------------------------+
| 'abc'.encode()                  | 1.05 us  | 883 ns: 1.19x faster (-16%)  |
+---------------------------------+----------+------------------------------+
| b'abc'.decode()                 | 1.34 us  | 1.17 us: 1.15x faster (-13%) |
+---------------------------------+----------+------------------------------+
| int_(4.2)                       | 1.23 us  | 859 ns: 1.43x faster (-30%)  |
+---------------------------------+----------+------------------------------+
| int_('5')                       | 2.20 us  | 1.41 us: 1.56x faster (-36%) |
+---------------------------------+----------+------------------------------+
| 42 .to_bytes(2, 'little')       | 1.45 us  | 1.09 us: 1.33x faster (-25%) |
+---------------------------------+----------+------------------------------+
| int_from_bytes(b'ab', 'little') | 1.07 us  | 737 ns: 1.45x faster (-31%)  |
+---------------------------------+----------+------------------------------+
| struct_i32_unpack_from(b'abcd') | 1.31 us  | 1.08 us: 1.21x faster (-18%) |
+---------------------------------+----------+------------------------------+
| re_word_match('a')              | 2.85 us  | 2.06 us: 1.39x faster (-28%) |
+---------------------------------+----------+------------------------------+
| datetime_now()                  | 6.20 us  | 5.92 us: 1.05x faster (-4%)  |
+---------------------------------+----------+------------------------------+
| zlib_compress(b'abc')           | 28.7 us  | 26.9 us: 1.07x faster (-6%)  |
+---------------------------------+----------+------------------------------+

The speed up is significant on all computers.
History
Date User Action Args
2019-02-27 13:00:34serhiy.storchakasetrecipients: + serhiy.storchaka, vstinner, larry, josh.r, ammar2, xtreak
2019-02-27 13:00:34serhiy.storchakasetmessageid: <1551272434.07.0.338437739622.issue36127@roundup.psfhosted.org>
2019-02-27 13:00:34serhiy.storchakalinkissue36127 messages
2019-02-27 13:00:33serhiy.storchakacreate