Message 370771 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	Mark.Shannon, corona10, vstinner
Date	2020-06-05.17:32:32
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1591378353.08.0.640052481625.issue40521@roundup.psfhosted.org>
In-reply-to

Content
pyperformance comparaison between: * commit dc24b8a2ac32114313bae519db3ccc21fe45c982 (before "Make tuple free list per-interpreter" change) * PR 20645 (dict free lists) which cumulates all free lists changes (already commited + the PR) Extract of the tested patch, new PyInterpreterState members: -------------------- diff --git a/Include/internal/pycore_interp.h b/Include/internal/pycore_interp.h index f04ea330d0..b1a25e0ed4 100644 --- a/Include/internal/pycore_interp.h +++ b/Include/internal/pycore_interp.h (...) @@ -157,6 +233,18 @@ struct _is { / PyLongObject small_ints[_PY_NSMALLNEGINTS + _PY_NSMALLPOSINTS]; #endif + struct _Py_unicode_state unicode; + struct _Py_float_state float_state; + /* Using a cache is very effective since typically only a single slice is + created and then deleted again. / + PySliceObject slice_cache; + + struct _Py_tuple_state tuple; + struct _Py_list_state list; + struct _Py_dict_state dict_state; + struct _Py_frame_state frame; + struct _Py_async_gen_state async_gen; + struct _Py_context_state context; }; -------------------- Results: -------------------- $ python3 -m pyperf compare_to 2020-06-04_20-10-master-dc24b8a2ac32.json.gz 2020-06-04_20-10-master-dc24b8a2ac32-patch-free_lists.json.gz -G Slower (10): - chameleon: 20.1 ms +- 0.4 ms -> 23.1 ms +- 4.0 ms: 1.15x slower (+15%) - logging_silent: 334 ns +- 51 ns -> 371 ns +- 70 ns: 1.11x slower (+11%) - spectral_norm: 274 ms +- 37 ms -> 302 ms +- 55 ms: 1.10x slower (+10%) - logging_format: 22.5 us +- 0.4 us -> 24.5 us +- 2.7 us: 1.09x slower (+9%) - json_dumps: 26.6 ms +- 4.0 ms -> 28.7 ms +- 5.5 ms: 1.08x slower (+8%) - sympy_sum: 390 ms +- 3 ms -> 415 ms +- 45 ms: 1.06x slower (+6%) - float: 217 ms +- 3 ms -> 231 ms +- 30 ms: 1.06x slower (+6%) - pidigits: 306 ms +- 32 ms -> 323 ms +- 47 ms: 1.06x slower (+6%) - python_startup_no_site: 8.71 ms +- 0.77 ms -> 8.94 ms +- 0.91 ms: 1.03x slower (+3%) - xml_etree_process: 130 ms +- 1 ms -> 133 ms +- 2 ms: 1.02x slower (+2%) Faster (9): - pickle_pure_python: 1.05 ms +- 0.16 ms -> 964 us +- 19 us: 1.09x faster (-9%) - scimark_sparse_mat_mult: 11.4 ms +- 2.1 ms -> 10.5 ms +- 1.7 ms: 1.09x faster (-8%) - hexiom: 19.5 ms +- 4.1 ms -> 18.0 ms +- 3.0 ms: 1.08x faster (-7%) - telco: 15.7 ms +- 3.1 ms -> 14.5 ms +- 0.4 ms: 1.08x faster (-7%) - unpickle: 31.8 us +- 5.7 us -> 29.5 us +- 4.9 us: 1.08x faster (-7%) - scimark_lu: 292 ms +- 60 ms -> 274 ms +- 34 ms: 1.07x faster (-6%) - django_template: 123 ms +- 16 ms -> 119 ms +- 2 ms: 1.04x faster (-3%) - xml_etree_generate: 160 ms +- 4 ms -> 156 ms +- 3 ms: 1.02x faster (-2%) - xml_etree_iterparse: 178 ms +- 3 ms -> 177 ms +- 2 ms: 1.01x faster (-1%) Benchmark hidden because not significant (41): (...) -------------------- If we ignore differences smaller than 5%: -------------------- $ python3 -m pyperf compare_to 2020-06-04_20-10-master-dc24b8a2ac32.json.gz 2020-06-04_20-10-master-dc24b8a2ac32-patch-free_lists.json.gz -G --min-speed=5 Slower (8): - chameleon: 20.1 ms +- 0.4 ms -> 23.1 ms +- 4.0 ms: 1.15x slower (+15%) - logging_silent: 334 ns +- 51 ns -> 371 ns +- 70 ns: 1.11x slower (+11%) - spectral_norm: 274 ms +- 37 ms -> 302 ms +- 55 ms: 1.10x slower (+10%) - logging_format: 22.5 us +- 0.4 us -> 24.5 us +- 2.7 us: 1.09x slower (+9%) - json_dumps: 26.6 ms +- 4.0 ms -> 28.7 ms +- 5.5 ms: 1.08x slower (+8%) - sympy_sum: 390 ms +- 3 ms -> 415 ms +- 45 ms: 1.06x slower (+6%) - float: 217 ms +- 3 ms -> 231 ms +- 30 ms: 1.06x slower (+6%) - pidigits: 306 ms +- 32 ms -> 323 ms +- 47 ms: 1.06x slower (+6%) Faster (6): - pickle_pure_python: 1.05 ms +- 0.16 ms -> 964 us +- 19 us: 1.09x faster (-9%) - scimark_sparse_mat_mult: 11.4 ms +- 2.1 ms -> 10.5 ms +- 1.7 ms: 1.09x faster (-8%) - hexiom: 19.5 ms +- 4.1 ms -> 18.0 ms +- 3.0 ms: 1.08x faster (-7%) - telco: 15.7 ms +- 3.1 ms -> 14.5 ms +- 0.4 ms: 1.08x faster (-7%) - unpickle: 31.8 us +- 5.7 us -> 29.5 us +- 4.9 us: 1.08x faster (-7%) - scimark_lu: 292 ms +- 60 ms -> 274 ms +- 34 ms: 1.07x faster (-6%) Benchmark hidden because not significant (46): (...) -------------------- Honestly, I'm surprised by these results. I don't see how these free lists change can make between 6 and 9 benchamrks faster (ex: 1.08x faster for telco!?). For me, it sounds like speed.python.org runner has some troubles. You can notice it if you look at the 3 last runs at https://speed.python.org/ : they are some spikes (in both directions, faster or slower) which are very surprising. Pablo recently upgrade Ubuntu on the benchmark runner server. I don't know if it's related. I plan to recompute all benchmarks run on the benchmark runner server since over the last years, pyperf and pyperformance were upgraded multiple times (old data were computed with old versions) and the system (Ubuntu) was upgraded (again, old data were computed with older Ubiuntu packages).

pyperformance comparaison between:

* commit dc24b8a2ac32114313bae519db3ccc21fe45c982 (before "Make tuple free list per-interpreter" change)
* PR 20645 (dict free lists) which cumulates all free lists changes (already commited + the PR)

Extract of the tested patch, new PyInterpreterState members:
--------------------
diff --git a/Include/internal/pycore_interp.h b/Include/internal/pycore_interp.h
index f04ea330d0..b1a25e0ed4 100644
--- a/Include/internal/pycore_interp.h
+++ b/Include/internal/pycore_interp.h
(...)
@@ -157,6 +233,18 @@ struct _is {
     */
     PyLongObject* small_ints[_PY_NSMALLNEGINTS + _PY_NSMALLPOSINTS];
 #endif
+    struct _Py_unicode_state unicode;
+    struct _Py_float_state float_state;
+    /* Using a cache is very effective since typically only a single slice is
+       created and then deleted again. */
+    PySliceObject *slice_cache;
+
+    struct _Py_tuple_state tuple;
+    struct _Py_list_state list;
+    struct _Py_dict_state dict_state;
+    struct _Py_frame_state frame;
+    struct _Py_async_gen_state async_gen;
+    struct _Py_context_state context;
 };
--------------------

Results:
--------------------
$ python3 -m pyperf compare_to 2020-06-04_20-10-master-dc24b8a2ac32.json.gz 2020-06-04_20-10-master-dc24b8a2ac32-patch-free_lists.json.gz -G 
Slower (10):
- chameleon: 20.1 ms +- 0.4 ms -> 23.1 ms +- 4.0 ms: 1.15x slower (+15%)
- logging_silent: 334 ns +- 51 ns -> 371 ns +- 70 ns: 1.11x slower (+11%)
- spectral_norm: 274 ms +- 37 ms -> 302 ms +- 55 ms: 1.10x slower (+10%)
- logging_format: 22.5 us +- 0.4 us -> 24.5 us +- 2.7 us: 1.09x slower (+9%)
- json_dumps: 26.6 ms +- 4.0 ms -> 28.7 ms +- 5.5 ms: 1.08x slower (+8%)
- sympy_sum: 390 ms +- 3 ms -> 415 ms +- 45 ms: 1.06x slower (+6%)
- float: 217 ms +- 3 ms -> 231 ms +- 30 ms: 1.06x slower (+6%)
- pidigits: 306 ms +- 32 ms -> 323 ms +- 47 ms: 1.06x slower (+6%)
- python_startup_no_site: 8.71 ms +- 0.77 ms -> 8.94 ms +- 0.91 ms: 1.03x slower (+3%)
- xml_etree_process: 130 ms +- 1 ms -> 133 ms +- 2 ms: 1.02x slower (+2%)

Faster (9):
- pickle_pure_python: 1.05 ms +- 0.16 ms -> 964 us +- 19 us: 1.09x faster (-9%)
- scimark_sparse_mat_mult: 11.4 ms +- 2.1 ms -> 10.5 ms +- 1.7 ms: 1.09x faster (-8%)
- hexiom: 19.5 ms +- 4.1 ms -> 18.0 ms +- 3.0 ms: 1.08x faster (-7%)
- telco: 15.7 ms +- 3.1 ms -> 14.5 ms +- 0.4 ms: 1.08x faster (-7%)
- unpickle: 31.8 us +- 5.7 us -> 29.5 us +- 4.9 us: 1.08x faster (-7%)
- scimark_lu: 292 ms +- 60 ms -> 274 ms +- 34 ms: 1.07x faster (-6%)
- django_template: 123 ms +- 16 ms -> 119 ms +- 2 ms: 1.04x faster (-3%)
- xml_etree_generate: 160 ms +- 4 ms -> 156 ms +- 3 ms: 1.02x faster (-2%)
- xml_etree_iterparse: 178 ms +- 3 ms -> 177 ms +- 2 ms: 1.01x faster (-1%)

Benchmark hidden because not significant (41): (...)
--------------------

If we ignore differences smaller than 5%:
--------------------
$ python3 -m pyperf compare_to 2020-06-04_20-10-master-dc24b8a2ac32.json.gz 2020-06-04_20-10-master-dc24b8a2ac32-patch-free_lists.json.gz -G --min-speed=5
Slower (8):
- chameleon: 20.1 ms +- 0.4 ms -> 23.1 ms +- 4.0 ms: 1.15x slower (+15%)
- logging_silent: 334 ns +- 51 ns -> 371 ns +- 70 ns: 1.11x slower (+11%)
- spectral_norm: 274 ms +- 37 ms -> 302 ms +- 55 ms: 1.10x slower (+10%)
- logging_format: 22.5 us +- 0.4 us -> 24.5 us +- 2.7 us: 1.09x slower (+9%)
- json_dumps: 26.6 ms +- 4.0 ms -> 28.7 ms +- 5.5 ms: 1.08x slower (+8%)
- sympy_sum: 390 ms +- 3 ms -> 415 ms +- 45 ms: 1.06x slower (+6%)
- float: 217 ms +- 3 ms -> 231 ms +- 30 ms: 1.06x slower (+6%)
- pidigits: 306 ms +- 32 ms -> 323 ms +- 47 ms: 1.06x slower (+6%)

Faster (6):
- pickle_pure_python: 1.05 ms +- 0.16 ms -> 964 us +- 19 us: 1.09x faster (-9%)
- scimark_sparse_mat_mult: 11.4 ms +- 2.1 ms -> 10.5 ms +- 1.7 ms: 1.09x faster (-8%)
- hexiom: 19.5 ms +- 4.1 ms -> 18.0 ms +- 3.0 ms: 1.08x faster (-7%)
- telco: 15.7 ms +- 3.1 ms -> 14.5 ms +- 0.4 ms: 1.08x faster (-7%)
- unpickle: 31.8 us +- 5.7 us -> 29.5 us +- 4.9 us: 1.08x faster (-7%)
- scimark_lu: 292 ms +- 60 ms -> 274 ms +- 34 ms: 1.07x faster (-6%)

Benchmark hidden because not significant (46): (...)
--------------------

Honestly, I'm surprised by these results. I don't see how these free lists change can make between 6 and 9 benchamrks faster (ex: 1.08x faster for telco!?). For me, it sounds like speed.python.org runner has some troubles. You can notice it if you look at the 3 last runs at https://speed.python.org/ : they are some spikes (in both directions, faster or slower) which are very surprising.

Pablo recently upgrade Ubuntu on the benchmark runner server. I don't know if it's related.

I plan to recompute all benchmarks run on the benchmark runner server since over the last years, pyperf and pyperformance were upgraded multiple times (old data were computed with old versions) and the system (Ubuntu) was upgraded (again, old data were computed with older Ubiuntu packages).

History
Date	User	Action	Args
2020-06-05 17:32:33	vstinner	set	recipients: + vstinner, Mark.Shannon, corona10
2020-06-05 17:32:33	vstinner	set	messageid: <1591378353.08.0.640052481625.issue40521@roundup.psfhosted.org>
2020-06-05 17:32:33	vstinner	link	issue40521 messages
2020-06-05 17:32:32	vstinner	create