Message 330302 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	Aaron Hall, benjamin.peterson, eric.snow, mark.dickinson, miss-islington, pablogsal, serhiy.storchaka, thatiparthy, vstinner
Date	2018-11-23.10:55:41
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1542970542.73.0.788709270274.issue35059@psf.upfronthosting.co.za>
In-reply-to

Content
I wrote a pull request to replace static inline functions with C macros: PR 10669. I ran a benchmark on speed.python.org server using the "performance" benchmark suite: http://pyperformance.readthedocs.io/ I understand that from the benchmark results that converting macros to static inline functions has no significant impact on performances. Two benchmarks are 1.06x and 1.07x faster but it can be explained by the PGO compilation which is not reliable. One benchmark is way slower, but it seems like the benchmark has an issue. If you look at the 3 latest run on speed.python.org, I see: * 38.2 us (Sept 24) * 43.5 us (Nov 21) * 43.7 us (Nov 22) I don't think that any change in _pickle or pickle explains this significant slowdown. IMHO it's just that the benchmark is not reliable :-/ We have a performance timeline on the last 5 years, and this benchmark doesn't have a straight line, we can see that the result is a little bit random :-/ -- speed.python.org runs Ubuntu 16.04 with gcc 5.4.0. The result are the two attached (compressed) JSON files: * 2018-11-22_17-38-master-3bb183d7fb83-patch-10669.json.gz: reference, Python using C macros * 2018-11-22_17-38-master-3bb183d7fb83.json.gz: static inline, current master branch Comparison ignoring difference smaller than -5% and +5%, macros are the reference: vstinner@apu$ python3 -m perf compare_to --table -G --min-speed=5 macros.json.gz inline.json.gz +--------------+---------+------------------------------+ \| Benchmark \| macros \| inline \| +==============+=========+==============================+ \| regex_dna \| 288 ms \| 269 ms: 1.07x faster (-7%) \| +--------------+---------+------------------------------+ \| crypto_pyaes \| 236 ms \| 223 ms: 1.06x faster (-5%) \| +--------------+---------+------------------------------+ \| pickle_dict \| 37.8 us \| 43.7 us: 1.16x slower (+16%) \| +--------------+---------+------------------------------+ Not significant (54): (...) Raw comparison, full data, macros are the reference: vstinner@apu$ python3 -m perf compare_to --table -G macros.json.gz inline.json.gz +-------------------------+----------+------------------------------+ \| Benchmark \| macros \| inline \| +=========================+==========+==============================+ \| regex_dna \| 288 ms \| 269 ms: 1.07x faster (-7%) \| +-------------------------+----------+------------------------------+ \| crypto_pyaes \| 236 ms \| 223 ms: 1.06x faster (-5%) \| +-------------------------+----------+------------------------------+ \| nqueens \| 199 ms \| 195 ms: 1.02x faster (-2%) \| +-------------------------+----------+------------------------------+ \| raytrace \| 1.01 sec \| 984 ms: 1.02x faster (-2%) \| +-------------------------+----------+------------------------------+ \| chaos \| 224 ms \| 221 ms: 1.02x faster (-2%) \| +-------------------------+----------+------------------------------+ \| logging_simple \| 21.2 us \| 20.9 us: 1.01x faster (-1%) \| +-------------------------+----------+------------------------------+ \| unpack_sequence \| 117 ns \| 116 ns: 1.01x faster (-1%) \| +-------------------------+----------+------------------------------+ \| spectral_norm \| 247 ms \| 244 ms: 1.01x faster (-1%) \| +-------------------------+----------+------------------------------+ \| regex_v8 \| 41.8 ms \| 41.4 ms: 1.01x faster (-1%) \| +-------------------------+----------+------------------------------+ \| pickle \| 19.9 us \| 19.7 us: 1.01x faster (-1%) \| +-------------------------+----------+------------------------------+ \| logging_format \| 24.0 us \| 23.8 us: 1.01x faster (-1%) \| +-------------------------+----------+------------------------------+ \| scimark_sparse_mat_mult \| 9.06 ms \| 8.99 ms: 1.01x faster (-1%) \| +-------------------------+----------+------------------------------+ \| pathlib \| 42.1 ms \| 41.8 ms: 1.01x faster (-1%) \| +-------------------------+----------+------------------------------+ \| meteor_contest \| 188 ms \| 187 ms: 1.01x faster (-1%) \| +-------------------------+----------+------------------------------+ \| unpickle_pure_python \| 704 us \| 700 us: 1.01x faster (-1%) \| +-------------------------+----------+------------------------------+ \| python_startup \| 12.0 ms \| 12.0 ms: 1.00x faster (-0%) \| +-------------------------+----------+------------------------------+ \| python_startup_no_site \| 7.91 ms \| 7.94 ms: 1.00x slower (+0%) \| +-------------------------+----------+------------------------------+ \| sympy_integrate \| 35.0 ms \| 35.3 ms: 1.01x slower (+1%) \| +-------------------------+----------+------------------------------+ \| pidigits \| 283 ms \| 285 ms: 1.01x slower (+1%) \| +-------------------------+----------+------------------------------+ \| hexiom \| 17.3 ms \| 17.4 ms: 1.01x slower (+1%) \| +-------------------------+----------+------------------------------+ \| sympy_str \| 403 ms \| 407 ms: 1.01x slower (+1%) \| +-------------------------+----------+------------------------------+ \| mako \| 31.9 ms \| 32.2 ms: 1.01x slower (+1%) \| +-------------------------+----------+------------------------------+ \| django_template \| 293 ms \| 296 ms: 1.01x slower (+1%) \| +-------------------------+----------+------------------------------+ \| tornado_http \| 368 ms \| 372 ms: 1.01x slower (+1%) \| +-------------------------+----------+------------------------------+ \| sqlalchemy_declarative \| 287 ms \| 290 ms: 1.01x slower (+1%) \| +-------------------------+----------+------------------------------+ \| html5lib \| 190 ms \| 192 ms: 1.01x slower (+1%) \| +-------------------------+----------+------------------------------+ \| xml_etree_iterparse \| 181 ms \| 183 ms: 1.01x slower (+1%) \| +-------------------------+----------+------------------------------+ \| 2to3 \| 601 ms \| 610 ms: 1.02x slower (+2%) \| +-------------------------+----------+------------------------------+ \| unpickle_list \| 6.49 us \| 6.60 us: 1.02x slower (+2%) \| +-------------------------+----------+------------------------------+ \| scimark_monte_carlo \| 210 ms \| 214 ms: 1.02x slower (+2%) \| +-------------------------+----------+------------------------------+ \| sympy_sum \| 198 ms \| 202 ms: 1.02x slower (+2%) \| +-------------------------+----------+------------------------------+ \| telco \| 14.8 ms \| 15.1 ms: 1.02x slower (+2%) \| +-------------------------+----------+------------------------------+ \| fannkuch \| 855 ms \| 874 ms: 1.02x slower (+2%) \| +-------------------------+----------+------------------------------+ \| sympy_expand \| 857 ms \| 883 ms: 1.03x slower (+3%) \| +-------------------------+----------+------------------------------+ \| scimark_lu \| 298 ms \| 307 ms: 1.03x slower (+3%) \| +-------------------------+----------+------------------------------+ \| xml_etree_generate \| 207 ms \| 213 ms: 1.03x slower (+3%) \| +-------------------------+----------+------------------------------+ \| float \| 210 ms \| 216 ms: 1.03x slower (+3%) \| +-------------------------+----------+------------------------------+ \| xml_etree_process \| 173 ms \| 180 ms: 1.04x slower (+4%) \| +-------------------------+----------+------------------------------+ \| scimark_sor \| 357 ms \| 373 ms: 1.05x slower (+5%) \| +-------------------------+----------+------------------------------+ \| json_dumps \| 25.8 ms \| 27.0 ms: 1.05x slower (+5%) \| +-------------------------+----------+------------------------------+ \| pickle_dict \| 37.8 us \| 43.7 us: 1.16x slower (+16%) \| +-------------------------+----------+------------------------------+ Not significant (16): regex_effbot; go; json_loads; xml_etree_parse; dulwich_log; nbody; regex_compile; scimark_fft; pickle_pure_python; unpickle; pickle_list; sqlite_synth; richards; logging_silent; deltablue; sqlalchemy_imperative

I wrote a pull request to replace static inline functions with C macros: PR 10669. I ran a benchmark on speed.python.org server using the "performance" benchmark suite:
http://pyperformance.readthedocs.io/

I understand that from the benchmark results that converting macros to static inline functions has no significant impact on performances. Two benchmarks are 1.06x and 1.07x faster but it can be explained by the PGO compilation which is not reliable. One benchmark is way slower, but it seems like the benchmark has an issue. If you look at the 3 latest run on speed.python.org, I see:

* 38.2 us (Sept 24)
* 43.5 us (Nov 21)
* 43.7 us (Nov 22)

I don't think that any change in _pickle or pickle explains this significant slowdown. IMHO it's just that the benchmark is not reliable :-/ We have a performance timeline on the last 5 years, and this benchmark doesn't have a straight line, we can see that the result is a little bit random :-/

--

speed.python.org runs Ubuntu 16.04 with gcc 5.4.0.

The result are the two attached (compressed) JSON files:

* 2018-11-22_17-38-master-3bb183d7fb83-patch-10669.json.gz: reference, Python using C macros
* 2018-11-22_17-38-master-3bb183d7fb83.json.gz: static inline, current master branch

Comparison ignoring difference smaller than -5% and +5%, macros are the reference:

vstinner@apu$ python3 -m perf compare_to --table -G --min-speed=5 macros.json.gz inline.json.gz 
+--------------+---------+------------------------------+
| Benchmark    | macros  | inline                       |
+==============+=========+==============================+
| regex_dna    | 288 ms  | 269 ms: 1.07x faster (-7%)   |
+--------------+---------+------------------------------+
| crypto_pyaes | 236 ms  | 223 ms: 1.06x faster (-5%)   |
+--------------+---------+------------------------------+
| pickle_dict  | 37.8 us | 43.7 us: 1.16x slower (+16%) |
+--------------+---------+------------------------------+

Not significant (54): (...)


Raw comparison, full data, macros are the reference:

vstinner@apu$ python3 -m perf compare_to --table -G macros.json.gz inline.json.gz 
+-------------------------+----------+------------------------------+
| Benchmark               | macros   | inline                       |
+=========================+==========+==============================+
| regex_dna               | 288 ms   | 269 ms: 1.07x faster (-7%)   |
+-------------------------+----------+------------------------------+
| crypto_pyaes            | 236 ms   | 223 ms: 1.06x faster (-5%)   |
+-------------------------+----------+------------------------------+
| nqueens                 | 199 ms   | 195 ms: 1.02x faster (-2%)   |
+-------------------------+----------+------------------------------+
| raytrace                | 1.01 sec | 984 ms: 1.02x faster (-2%)   |
+-------------------------+----------+------------------------------+
| chaos                   | 224 ms   | 221 ms: 1.02x faster (-2%)   |
+-------------------------+----------+------------------------------+
| logging_simple          | 21.2 us  | 20.9 us: 1.01x faster (-1%)  |
+-------------------------+----------+------------------------------+
| unpack_sequence         | 117 ns   | 116 ns: 1.01x faster (-1%)   |
+-------------------------+----------+------------------------------+
| spectral_norm           | 247 ms   | 244 ms: 1.01x faster (-1%)   |
+-------------------------+----------+------------------------------+
| regex_v8                | 41.8 ms  | 41.4 ms: 1.01x faster (-1%)  |
+-------------------------+----------+------------------------------+
| pickle                  | 19.9 us  | 19.7 us: 1.01x faster (-1%)  |
+-------------------------+----------+------------------------------+
| logging_format          | 24.0 us  | 23.8 us: 1.01x faster (-1%)  |
+-------------------------+----------+------------------------------+
| scimark_sparse_mat_mult | 9.06 ms  | 8.99 ms: 1.01x faster (-1%)  |
+-------------------------+----------+------------------------------+
| pathlib                 | 42.1 ms  | 41.8 ms: 1.01x faster (-1%)  |
+-------------------------+----------+------------------------------+
| meteor_contest          | 188 ms   | 187 ms: 1.01x faster (-1%)   |
+-------------------------+----------+------------------------------+
| unpickle_pure_python    | 704 us   | 700 us: 1.01x faster (-1%)   |
+-------------------------+----------+------------------------------+
| python_startup          | 12.0 ms  | 12.0 ms: 1.00x faster (-0%)  |
+-------------------------+----------+------------------------------+
| python_startup_no_site  | 7.91 ms  | 7.94 ms: 1.00x slower (+0%)  |
+-------------------------+----------+------------------------------+
| sympy_integrate         | 35.0 ms  | 35.3 ms: 1.01x slower (+1%)  |
+-------------------------+----------+------------------------------+
| pidigits                | 283 ms   | 285 ms: 1.01x slower (+1%)   |
+-------------------------+----------+------------------------------+
| hexiom                  | 17.3 ms  | 17.4 ms: 1.01x slower (+1%)  |
+-------------------------+----------+------------------------------+
| sympy_str               | 403 ms   | 407 ms: 1.01x slower (+1%)   |
+-------------------------+----------+------------------------------+
| mako                    | 31.9 ms  | 32.2 ms: 1.01x slower (+1%)  |
+-------------------------+----------+------------------------------+
| django_template         | 293 ms   | 296 ms: 1.01x slower (+1%)   |
+-------------------------+----------+------------------------------+
| tornado_http            | 368 ms   | 372 ms: 1.01x slower (+1%)   |
+-------------------------+----------+------------------------------+
| sqlalchemy_declarative  | 287 ms   | 290 ms: 1.01x slower (+1%)   |
+-------------------------+----------+------------------------------+
| html5lib                | 190 ms   | 192 ms: 1.01x slower (+1%)   |
+-------------------------+----------+------------------------------+
| xml_etree_iterparse     | 181 ms   | 183 ms: 1.01x slower (+1%)   |
+-------------------------+----------+------------------------------+
| 2to3                    | 601 ms   | 610 ms: 1.02x slower (+2%)   |
+-------------------------+----------+------------------------------+
| unpickle_list           | 6.49 us  | 6.60 us: 1.02x slower (+2%)  |
+-------------------------+----------+------------------------------+
| scimark_monte_carlo     | 210 ms   | 214 ms: 1.02x slower (+2%)   |
+-------------------------+----------+------------------------------+
| sympy_sum               | 198 ms   | 202 ms: 1.02x slower (+2%)   |
+-------------------------+----------+------------------------------+
| telco                   | 14.8 ms  | 15.1 ms: 1.02x slower (+2%)  |
+-------------------------+----------+------------------------------+
| fannkuch                | 855 ms   | 874 ms: 1.02x slower (+2%)   |
+-------------------------+----------+------------------------------+
| sympy_expand            | 857 ms   | 883 ms: 1.03x slower (+3%)   |
+-------------------------+----------+------------------------------+
| scimark_lu              | 298 ms   | 307 ms: 1.03x slower (+3%)   |
+-------------------------+----------+------------------------------+
| xml_etree_generate      | 207 ms   | 213 ms: 1.03x slower (+3%)   |
+-------------------------+----------+------------------------------+
| float                   | 210 ms   | 216 ms: 1.03x slower (+3%)   |
+-------------------------+----------+------------------------------+
| xml_etree_process       | 173 ms   | 180 ms: 1.04x slower (+4%)   |
+-------------------------+----------+------------------------------+
| scimark_sor             | 357 ms   | 373 ms: 1.05x slower (+5%)   |
+-------------------------+----------+------------------------------+
| json_dumps              | 25.8 ms  | 27.0 ms: 1.05x slower (+5%)  |
+-------------------------+----------+------------------------------+
| pickle_dict             | 37.8 us  | 43.7 us: 1.16x slower (+16%) |
+-------------------------+----------+------------------------------+

Not significant (16): regex_effbot; go; json_loads; xml_etree_parse; dulwich_log; nbody; regex_compile; scimark_fft; pickle_pure_python; unpickle; pickle_list; sqlite_synth; richards; logging_silent; deltablue; sqlalchemy_imperative

History
Date	User	Action	Args
2018-11-23 10:55:43	vstinner	set	recipients: + vstinner, mark.dickinson, benjamin.peterson, eric.snow, serhiy.storchaka, thatiparthy, Aaron Hall, pablogsal, miss-islington
2018-11-23 10:55:42	vstinner	set	messageid: <1542970542.73.0.788709270274.issue35059@psf.upfronthosting.co.za>
2018-11-23 10:55:42	vstinner	link	issue35059 messages
2018-11-23 10:55:41	vstinner	create