URL |
Status |
Linked |
Edit |
PR 19933 |
merged |
vstinner,
2020-05-05 16:11
|
|
PR 19937 |
merged |
vstinner,
2020-05-05 16:48
|
|
PR 19959 |
merged |
vstinner,
2020-05-06 15:44
|
|
PR 19960 |
merged |
vstinner,
2020-05-06 15:48
|
|
PR 20081 |
merged |
vstinner,
2020-05-13 23:09
|
|
PR 20082 |
closed |
vstinner,
2020-05-13 23:13
|
|
PR 20085 |
merged |
vstinner,
2020-05-14 00:56
|
|
PR 20246 |
merged |
vstinner,
2020-05-19 22:44
|
|
PR 20247 |
merged |
vstinner,
2020-05-19 23:22
|
|
PR 20636 |
merged |
vstinner,
2020-06-04 22:01
|
|
PR 20637 |
merged |
vstinner,
2020-06-04 22:53
|
|
PR 20638 |
merged |
vstinner,
2020-06-04 23:20
|
|
PR 20642 |
merged |
vstinner,
2020-06-04 23:44
|
|
PR 20643 |
merged |
vstinner,
2020-06-05 00:10
|
|
PR 20644 |
merged |
vstinner,
2020-06-05 00:36
|
|
PR 20645 |
merged |
vstinner,
2020-06-05 01:02
|
|
PR 21068 |
merged |
vstinner,
2020-06-23 10:41
|
|
PR 21073 |
merged |
rhettinger,
2020-06-23 13:12
|
|
PR 21074 |
merged |
vstinner,
2020-06-23 13:14
|
|
PR 21082 |
merged |
vstinner,
2020-06-23 13:57
|
|
PR 21085 |
merged |
rhettinger,
2020-06-23 15:02
|
|
PR 21086 |
merged |
vstinner,
2020-06-23 15:11
|
|
PR 21096 |
merged |
vstinner,
2020-06-23 21:48
|
|
PR 21099 |
merged |
vstinner,
2020-06-23 22:14
|
|
PR 21101 |
merged |
vstinner,
2020-06-23 22:45
|
|
PR 21103 |
merged |
vstinner,
2020-06-24 00:34
|
|
PR 21116 |
merged |
vstinner,
2020-06-24 12:56
|
|
PR 21142 |
merged |
vstinner,
2020-06-25 11:15
|
|
PR 21265 |
merged |
vstinner,
2020-07-01 17:31
|
|
PR 22376 |
merged |
vstinner,
2020-09-23 10:55
|
|
PR 24821 |
merged |
JunyiXie,
2021-03-11 09:44
|
|
PR 24944 |
merged |
rhettinger,
2021-03-20 15:15
|
|
PR 24964 |
merged |
vstinner,
2021-03-22 10:03
|
|
PR 25906 |
merged |
rhettinger,
2021-05-04 22:25
|
|
PR 30422 |
merged |
vstinner,
2022-01-05 16:26
|
|
PR 30425 |
merged |
vstinner,
2022-01-06 07:59
|
|
PR 30433 |
closed |
vstinner,
2022-01-06 14:30
|
|
msg368175 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-05-05 15:48 |
tuple, dict and frame use free lists to optimize the creation of objects.
Unicode uses "interned" strings to reduce the Python memory footprint and speedup dictionary lookups.
Unicode also uses singletons for single letter Latin1 characters ([U+0000; U+00FF] range).
All these optimizations are incompatible with isolated subinterpreters, since caches are currently shared by all inteprepreters. These caches should be made per-intepreter. See bpo-40512 "Meta issue: per-interpreter GIL" for the rationale.
I already made small integer singletons per interpreter in bpo-38858:
* commit 5dcc06f6e0d7b5d6589085692b86c63e35e2325e
* commit 630c8df5cf126594f8c1c4579c1888ca80a29d59.
|
msg368177 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-05-05 16:50 |
New changeset 607b1027fec7b4a1602aab7df57795fbcec1c51b by Victor Stinner in branch 'master':
bpo-40521: Disable Unicode caches in isolated subinterpreters (GH-19933)
https://github.com/python/cpython/commit/607b1027fec7b4a1602aab7df57795fbcec1c51b
|
msg368187 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-05-05 17:55 |
New changeset b4b53868d7d6cd13505321d3802fd00865b25e05 by Victor Stinner in branch 'master':
bpo-40521: Disable free lists in subinterpreters (GH-19937)
https://github.com/python/cpython/commit/b4b53868d7d6cd13505321d3802fd00865b25e05
|
msg368278 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-05-06 16:24 |
New changeset 89fc4a34cf7a01df9dd269d32d3706c68a72d130 by Victor Stinner in branch 'master':
bpo-40521: Disable method cache in subinterpreters (GH-19960)
https://github.com/python/cpython/commit/89fc4a34cf7a01df9dd269d32d3706c68a72d130
|
msg368283 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-05-06 17:05 |
New changeset b7aa23d29fa48238dab3692d02e1f0a7e8a5af9c by Victor Stinner in branch 'master':
bpo-40521: Disable list free list in subinterpreters (GH-19959)
https://github.com/python/cpython/commit/b7aa23d29fa48238dab3692d02e1f0a7e8a5af9c
|
msg368807 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-05-13 23:35 |
I wrote a draft PR to make interned strings per-interpreter. It does crash because it requires to make method cache and _PyUnicode_FromId() (bpo-39465) compatible with subinterpreters.
|
msg368808 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-05-13 23:48 |
New changeset 3d17c045b4c3d09b72bbd95ed78af1ae6f0d98d2 by Victor Stinner in branch 'master':
bpo-40521: Add PyInterpreterState.unicode (GH-20081)
https://github.com/python/cpython/commit/3d17c045b4c3d09b72bbd95ed78af1ae6f0d98d2
|
msg369407 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-05-19 23:57 |
New changeset 0509c4547fc95cc32a91ac446a26192c3bfdf157 by Victor Stinner in branch 'master':
bpo-40521: Fix update_slot() when INTERN_NAME_STRINGS is not defined (#20246)
https://github.com/python/cpython/commit/0509c4547fc95cc32a91ac446a26192c3bfdf157
|
msg370636 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-02 22:43 |
Microbenchmark for tuple free list to measure PR 20247 overhead: microbench_tuple.py. It requires to apply bench_tuple.patch.
|
msg370733 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-04 21:38 |
New changeset 69ac6e58fd98de339c013fe64cd1cf763e4f9bca by Victor Stinner in branch 'master':
bpo-40521: Make tuple free list per-interpreter (GH-20247)
https://github.com/python/cpython/commit/69ac6e58fd98de339c013fe64cd1cf763e4f9bca
|
msg370734 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-04 22:50 |
New changeset 2ba59370c3dda2ac229c14510e53a05074b133d1 by Victor Stinner in branch 'master':
bpo-40521: Make float free list per-interpreter (GH-20636)
https://github.com/python/cpython/commit/2ba59370c3dda2ac229c14510e53a05074b133d1
|
msg370735 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-04 23:14 |
New changeset 7daba6f221e713f7f60c613b246459b07d179f91 by Victor Stinner in branch 'master':
bpo-40521: Make slice cache per-interpreter (GH-20637)
https://github.com/python/cpython/commit/7daba6f221e713f7f60c613b246459b07d179f91
|
msg370737 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-04 23:39 |
New changeset 3744ed2c9c0b3905947602fc375de49533790cb9 by Victor Stinner in branch 'master':
bpo-40521: Make frame free list per-interpreter (GH-20638)
https://github.com/python/cpython/commit/3744ed2c9c0b3905947602fc375de49533790cb9
|
msg370740 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-05 00:05 |
New changeset 88ec9190105c9b03f49aaef601ce02b242a75273 by Victor Stinner in branch 'master':
bpo-40521: Make list free list per-interpreter (GH-20642)
https://github.com/python/cpython/commit/88ec9190105c9b03f49aaef601ce02b242a75273
|
msg370741 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-05 00:34 |
New changeset 78a02c2568714562e23e885b6dc5730601f35226 by Victor Stinner in branch 'master':
bpo-40521: Make async gen free lists per-interpreter (GH-20643)
https://github.com/python/cpython/commit/78a02c2568714562e23e885b6dc5730601f35226
|
msg370742 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-05 00:56 |
New changeset e005ead49b1ee2b1507ceea94e6f89c28ecf1f81 by Victor Stinner in branch 'master':
bpo-40521: Make context free list per-interpreter (GH-20644)
https://github.com/python/cpython/commit/e005ead49b1ee2b1507ceea94e6f89c28ecf1f81
|
msg370754 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-05 09:47 |
> bpo-40521: Make list free list per-interpreter (GH-20642)
> https://github.com/python/cpython/commit/88ec9190105c9b03f49aaef601ce02b242a75273
This change contains an interesting fix:
* _PyGC_Fini() clears gcstate->garbage list which can be stored in
the list free list. Call _PyGC_Fini() before _PyList_Fini() to
prevent leaking this list.
Maybe "Fini" functions should disable free lists to prevent following code to add something to a free list, during Python finalization.
|
msg370755 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-05 09:50 |
bench_dict.patch: Microbenchmark on the C function PyDict_New() to measure the overhead of PR 20645.
|
msg370756 - (view) |
Author: Mark Shannon (Mark.Shannon) * |
Date: 2020-06-05 09:57 |
I'm worried about the performance impact of these changes, especially as many of the changes haven't been reviewed.
Have you done any performance analysis or tests of the cumulative effect of all these changes?
|
msg370757 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-05 09:59 |
> Have you done any performance analysis or tests of the cumulative effect of all these changes?
No. It would be interesting to measure that using pyperformance.
|
msg370771 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-05 17:32 |
pyperformance comparaison between:
* commit dc24b8a2ac32114313bae519db3ccc21fe45c982 (before "Make tuple free list per-interpreter" change)
* PR 20645 (dict free lists) which cumulates all free lists changes (already commited + the PR)
Extract of the tested patch, new PyInterpreterState members:
--------------------
diff --git a/Include/internal/pycore_interp.h b/Include/internal/pycore_interp.h
index f04ea330d0..b1a25e0ed4 100644
--- a/Include/internal/pycore_interp.h
+++ b/Include/internal/pycore_interp.h
(...)
@@ -157,6 +233,18 @@ struct _is {
*/
PyLongObject* small_ints[_PY_NSMALLNEGINTS + _PY_NSMALLPOSINTS];
#endif
+ struct _Py_unicode_state unicode;
+ struct _Py_float_state float_state;
+ /* Using a cache is very effective since typically only a single slice is
+ created and then deleted again. */
+ PySliceObject *slice_cache;
+
+ struct _Py_tuple_state tuple;
+ struct _Py_list_state list;
+ struct _Py_dict_state dict_state;
+ struct _Py_frame_state frame;
+ struct _Py_async_gen_state async_gen;
+ struct _Py_context_state context;
};
--------------------
Results:
--------------------
$ python3 -m pyperf compare_to 2020-06-04_20-10-master-dc24b8a2ac32.json.gz 2020-06-04_20-10-master-dc24b8a2ac32-patch-free_lists.json.gz -G
Slower (10):
- chameleon: 20.1 ms +- 0.4 ms -> 23.1 ms +- 4.0 ms: 1.15x slower (+15%)
- logging_silent: 334 ns +- 51 ns -> 371 ns +- 70 ns: 1.11x slower (+11%)
- spectral_norm: 274 ms +- 37 ms -> 302 ms +- 55 ms: 1.10x slower (+10%)
- logging_format: 22.5 us +- 0.4 us -> 24.5 us +- 2.7 us: 1.09x slower (+9%)
- json_dumps: 26.6 ms +- 4.0 ms -> 28.7 ms +- 5.5 ms: 1.08x slower (+8%)
- sympy_sum: 390 ms +- 3 ms -> 415 ms +- 45 ms: 1.06x slower (+6%)
- float: 217 ms +- 3 ms -> 231 ms +- 30 ms: 1.06x slower (+6%)
- pidigits: 306 ms +- 32 ms -> 323 ms +- 47 ms: 1.06x slower (+6%)
- python_startup_no_site: 8.71 ms +- 0.77 ms -> 8.94 ms +- 0.91 ms: 1.03x slower (+3%)
- xml_etree_process: 130 ms +- 1 ms -> 133 ms +- 2 ms: 1.02x slower (+2%)
Faster (9):
- pickle_pure_python: 1.05 ms +- 0.16 ms -> 964 us +- 19 us: 1.09x faster (-9%)
- scimark_sparse_mat_mult: 11.4 ms +- 2.1 ms -> 10.5 ms +- 1.7 ms: 1.09x faster (-8%)
- hexiom: 19.5 ms +- 4.1 ms -> 18.0 ms +- 3.0 ms: 1.08x faster (-7%)
- telco: 15.7 ms +- 3.1 ms -> 14.5 ms +- 0.4 ms: 1.08x faster (-7%)
- unpickle: 31.8 us +- 5.7 us -> 29.5 us +- 4.9 us: 1.08x faster (-7%)
- scimark_lu: 292 ms +- 60 ms -> 274 ms +- 34 ms: 1.07x faster (-6%)
- django_template: 123 ms +- 16 ms -> 119 ms +- 2 ms: 1.04x faster (-3%)
- xml_etree_generate: 160 ms +- 4 ms -> 156 ms +- 3 ms: 1.02x faster (-2%)
- xml_etree_iterparse: 178 ms +- 3 ms -> 177 ms +- 2 ms: 1.01x faster (-1%)
Benchmark hidden because not significant (41): (...)
--------------------
If we ignore differences smaller than 5%:
--------------------
$ python3 -m pyperf compare_to 2020-06-04_20-10-master-dc24b8a2ac32.json.gz 2020-06-04_20-10-master-dc24b8a2ac32-patch-free_lists.json.gz -G --min-speed=5
Slower (8):
- chameleon: 20.1 ms +- 0.4 ms -> 23.1 ms +- 4.0 ms: 1.15x slower (+15%)
- logging_silent: 334 ns +- 51 ns -> 371 ns +- 70 ns: 1.11x slower (+11%)
- spectral_norm: 274 ms +- 37 ms -> 302 ms +- 55 ms: 1.10x slower (+10%)
- logging_format: 22.5 us +- 0.4 us -> 24.5 us +- 2.7 us: 1.09x slower (+9%)
- json_dumps: 26.6 ms +- 4.0 ms -> 28.7 ms +- 5.5 ms: 1.08x slower (+8%)
- sympy_sum: 390 ms +- 3 ms -> 415 ms +- 45 ms: 1.06x slower (+6%)
- float: 217 ms +- 3 ms -> 231 ms +- 30 ms: 1.06x slower (+6%)
- pidigits: 306 ms +- 32 ms -> 323 ms +- 47 ms: 1.06x slower (+6%)
Faster (6):
- pickle_pure_python: 1.05 ms +- 0.16 ms -> 964 us +- 19 us: 1.09x faster (-9%)
- scimark_sparse_mat_mult: 11.4 ms +- 2.1 ms -> 10.5 ms +- 1.7 ms: 1.09x faster (-8%)
- hexiom: 19.5 ms +- 4.1 ms -> 18.0 ms +- 3.0 ms: 1.08x faster (-7%)
- telco: 15.7 ms +- 3.1 ms -> 14.5 ms +- 0.4 ms: 1.08x faster (-7%)
- unpickle: 31.8 us +- 5.7 us -> 29.5 us +- 4.9 us: 1.08x faster (-7%)
- scimark_lu: 292 ms +- 60 ms -> 274 ms +- 34 ms: 1.07x faster (-6%)
Benchmark hidden because not significant (46): (...)
--------------------
Honestly, I'm surprised by these results. I don't see how these free lists change can make between 6 and 9 benchamrks faster (ex: 1.08x faster for telco!?). For me, it sounds like speed.python.org runner has some troubles. You can notice it if you look at the 3 last runs at https://speed.python.org/ : they are some spikes (in both directions, faster or slower) which are very surprising.
Pablo recently upgrade Ubuntu on the benchmark runner server. I don't know if it's related.
I plan to recompute all benchmarks run on the benchmark runner server since over the last years, pyperf and pyperformance were upgraded multiple times (old data were computed with old versions) and the system (Ubuntu) was upgraded (again, old data were computed with older Ubiuntu packages).
|
msg370928 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-07 23:38 |
See also bpo-40887: "Free lists are still used after being finalized (cleared)".
|
msg370969 - (view) |
Author: Mark Shannon (Mark.Shannon) * |
Date: 2020-06-08 09:24 |
I'd be interested to see if you can get more consistent results.
Performance of modern hardware is very sensitive to memory layout, so some sort of address randomization might be needed to remove artifacts of layout.
It is possible that the objects on the free lists for telco are better aligned with cache lines, or fit is cache better in some way.
And conversely, in chameleon, objects fit cache in a worse way.
Just a guess, of course.
Thanks for trying to get some benchmark results.
|
msg372146 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-23 09:33 |
New changeset b4e85cadfbc2b1b24ec5f3159e351dbacedaa5e0 by Victor Stinner in branch 'master':
bpo-40521: Make dict free lists per-interpreter (GH-20645)
https://github.com/python/cpython/commit/b4e85cadfbc2b1b24ec5f3159e351dbacedaa5e0
|
msg372148 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-23 09:38 |
All free lists are now per-interpreter! See Modules/gcmodule.c:
static void
clear_freelists(PyThreadState *tstate)
{
_PyFrame_ClearFreeList(tstate);
_PyTuple_ClearFreeList(tstate);
_PyFloat_ClearFreeList(tstate);
_PyList_ClearFreeList(tstate);
_PyDict_ClearFreeList(tstate);
_PyAsyncGen_ClearFreeLists(tstate);
_PyContext_ClearFreeList(tstate);
}
I'm still working on the Unicode caches.
|
msg372161 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-23 12:08 |
New changeset 261cfedf7657a515e04428bba58eba2a9bb88208 by Victor Stinner in branch 'master':
bpo-40521: Make the empty frozenset per interpreter (GH-21068)
https://github.com/python/cpython/commit/261cfedf7657a515e04428bba58eba2a9bb88208
|
msg372168 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2020-06-23 13:50 |
New changeset 32f2eda85957365d208f499b730d30b7eb419741 by Raymond Hettinger in branch 'master':
bpo-40521: Remove freelist from collections.deque() (GH-21073)
https://github.com/python/cpython/commit/32f2eda85957365d208f499b730d30b7eb419741
|
msg372169 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-23 13:54 |
New changeset c41eed1a874e2f22bde45c3c89418414b7a37f46 by Victor Stinner in branch 'master':
bpo-40521: Make bytes singletons per interpreter (GH-21074)
https://github.com/python/cpython/commit/c41eed1a874e2f22bde45c3c89418414b7a37f46
|
msg372176 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-23 14:40 |
New changeset 522691c46e2ae51faaad5bbbce7d959dd61770df by Victor Stinner in branch 'master':
bpo-40521: Cleanup code of free lists (GH-21082)
https://github.com/python/cpython/commit/522691c46e2ae51faaad5bbbce7d959dd61770df
|
msg372181 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-23 15:43 |
New changeset f9bd05e83e32bece49de5af0c9a232325c57648a by Raymond Hettinger in branch 'master':
bpo-40521: Empty frozenset is no longer a singleton (GH-21085)
https://github.com/python/cpython/commit/f9bd05e83e32bece49de5af0c9a232325c57648a
|
msg372207 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-23 20:55 |
New changeset 281cce1106568ef9fec17e3c72d289416fac02a5 by Victor Stinner in branch 'master':
bpo-40521: Make MemoryError free list per interpreter (GH-21086)
https://github.com/python/cpython/commit/281cce1106568ef9fec17e3c72d289416fac02a5
|
msg372209 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-23 22:10 |
New changeset f363d0a6e9cfa50677a6de203735fbc0d06c2f49 by Victor Stinner in branch 'master':
bpo-40521: Make empty Unicode string per interpreter (GH-21096)
https://github.com/python/cpython/commit/f363d0a6e9cfa50677a6de203735fbc0d06c2f49
|
msg372216 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-23 22:34 |
New changeset 90ed8a6d71b2d6e0853c14e8e6f85fe730a4329a by Victor Stinner in branch 'master':
bpo-40521: Optimize PyUnicode_New(0, maxchar) (GH-21099)
https://github.com/python/cpython/commit/90ed8a6d71b2d6e0853c14e8e6f85fe730a4329a
|
msg372220 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-24 00:22 |
New changeset 2f9ada96e0d420fed0d09a032b37197f08ef167a by Victor Stinner in branch 'master':
bpo-40521: Make Unicode latin1 singletons per interpreter (GH-21101)
https://github.com/python/cpython/commit/2f9ada96e0d420fed0d09a032b37197f08ef167a
|
msg372223 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-24 01:21 |
New changeset cde283d16d87024f455e45c6f1b4e4f7d8905836 by Victor Stinner in branch 'master':
bpo-40521: Fix _PyContext_Fini() (GH-21103)
https://github.com/python/cpython/commit/cde283d16d87024f455e45c6f1b4e4f7d8905836
|
msg372250 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-24 13:22 |
New changeset 0430dfac629b4eb0e899a09b899a494aa92145f6 by Victor Stinner in branch 'master':
bpo-40521: Always create the empty tuple singleton (GH-21116)
https://github.com/python/cpython/commit/0430dfac629b4eb0e899a09b899a494aa92145f6
|
msg372357 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-06-25 12:07 |
New changeset 91698d8caa4b5bb6e8dbb64b156e8afe9e32cac1 by Victor Stinner in branch 'master':
bpo-40521: Optimize PyBytes_FromStringAndSize(str, 0) (GH-21142)
https://github.com/python/cpython/commit/91698d8caa4b5bb6e8dbb64b156e8afe9e32cac1
|
msg372795 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-07-01 21:21 |
New changeset 90db4653ae37ef90754cfd2cd6ec6857b87a88e6 by Victor Stinner in branch 'master':
bpo-40521: Cleanup finalize_interp_types() (GH-21265)
https://github.com/python/cpython/commit/90db4653ae37ef90754cfd2cd6ec6857b87a88e6
|
msg377368 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-09-23 12:05 |
New changeset 7f413a5d95e6d7ddddd6e2c9844c33594d6288f4 by Victor Stinner in branch 'master':
bpo-40521: Fix PyUnicode_InternInPlace() (GH-22376)
https://github.com/python/cpython/commit/7f413a5d95e6d7ddddd6e2c9844c33594d6288f4
|
msg383789 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-12-26 01:58 |
New changeset ea251806b8dffff11b30d2182af1e589caf88acf by Victor Stinner in branch 'master':
bpo-40521: Per-interpreter interned strings (GH-20085)
https://github.com/python/cpython/commit/ea251806b8dffff11b30d2182af1e589caf88acf
|
msg383790 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-12-26 02:00 |
> bpo-40521: Per-interpreter interned strings (GH-20085)
That one wasn't easy, but it's now done! I close the issue.
|
msg383829 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-12-26 22:09 |
> New changeset ea251806b8dffff11b30d2182af1e589caf88acf by Victor Stinner in branch 'master':
> bpo-40521: Per-interpreter interned strings (GH-20085)
I reopen the issue. This change caused a regression in attached interned_bug.py. Output:
---
$ ./python interned_bug.py
Exception ignored deletion of interned string failed:
KeyError: 'out of memory'
python: Objects/unicodeobject.c:1946: unicode_dealloc: Assertion `Py_REFCNT(unicode) == 1' failed.
Abandon (core dumped)
---
Running "import xml.parsers.expat" in a subinterpreter causes two issues when the subinterpreter completes:
* pyexpat.errors and pyexpat.model dictionaries are cleared: all values set to None
* unicode_dealloc() logs an error on an interned string in the subinterpreter, because the string doesn't exist in the subinterpreter interned dictionary.
The interned string is created in the main interpreter and so stored in the main interpreter interned dictionary.
The string is stored in 2 dictionaries of pyexpat.errors dictionaries:
>>> pyexpat.errors.messages[1]
'out of memory'
>>> pyexpat.errors.codes['out of memory']
1
When the subinterpreter clears pyexpat.errors and pyexpat.model dictionaries, the interned string is deleted: unicode_dealloc() is called. But unicode_dealloc() fails to delete the interned string in the subinterpreter interned dictionary.
pyexpat.errors and pyexpat.model modules are cleared because they are stored as different names in sys.modules by Lib/xml/parsers/expat.py:
sys.modules['xml.parsers.expat.model'] = model
sys.modules['xml.parsers.expat.errors'] = errors
|
msg385950 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-01-29 22:00 |
> I reopen the issue. This change caused a regression in attached interned_bug.py.
Fixed by:
commit c8a87addb1fa35dec79ed8f227eba3694fc36234
Author: Mohamed Koubaa <koubaa.m@gmail.com>
Date: Mon Jan 4 08:34:26 2021 -0600
bpo-1635741: Port pyexpat to multi-phase init (PEP 489) (GH-22222)
|
msg388492 - (view) |
Author: junyixie (JunyiXie) * |
Date: 2021-03-11 09:44 |
Should Make dtoa bigint free list per-interpreter.
static Bigint *bigint_freelist[Kmax+1]; -> _is { Bigint *bigint_freelist[Kmax+1]; }
|
msg388493 - (view) |
Author: junyixie (JunyiXie) * |
Date: 2021-03-11 09:46 |
https://github.com/python/cpython/pull/24821/commits/9d7681dbd273b5025fd9b19d1be0a1f978a0b12e
|
msg388617 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-03-13 13:25 |
New changeset 5bd1059184b154d339f1bd53d23c98b5bcf14c8c by junyixie in branch 'master':
bpo-40521: Make dtoa bigint free list per-interpreter (GH-24821)
https://github.com/python/cpython/commit/5bd1059184b154d339f1bd53d23c98b5bcf14c8c
|
msg389226 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2021-03-21 11:59 |
Hi Victor,
I just noticed the change to dtoa.c in GH-24821. Please could you explain what the benefit of this change was?
In general, we need to be very conservative with changes to dtoa.c: it's a complex, fragile, performance-critical piece of code, and ideally we'd like it not to diverge from the upstream code any more than it already has, in case we need to integrate bugfixes from upstream.
It's feeling as though the normal Python development process is being bypassed here. As I understand it, this and similar changes are in aid of per-subinterpreter GILs. Has there been agreement from the core devs or steering council that this is a desirable goal? Should there be a PEP before more changes like this are made? (Or maybe there's already a PEP, that I missed? I know about PEP 554, but that PEP is explicit that GIL sharing is out of scope.)
|
msg389294 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-03-22 10:02 |
Mark Dickinson: "I just noticed the change to dtoa.c in GH-24821. Please could you explain what the benefit of this change was?"
The rationale is explained in bpo-40512. The goal is to run multiple Python interpreters in parallel in the same process.
dtoa.c had global variables shared by all interpreters without locking, so two intepreters could corrupt the freelist consistency.
Mark Dickinson: "In general, we need to be very conservative with changes to dtoa.c: it's a complex, fragile, performance-critical piece of code, and ideally we'd like it not to diverge from the upstream code any more than it already has, in case we need to integrate bugfixes from upstream."
I know that dtoa.c was copied from a third party project. But the commit 5bd1059184b154d339f1bd53d23c98b5bcf14c8c change change only makes sense in Python, I don't think that it would make sense to propose it upstream.
dtoa.c _Py_dg_strtod() is called by:
* float.__round__()
* _PyOS_ascii_strtod()
_PyOS_ascii_strtod() is called PyOS_string_to_double() which is called by:
* float_from_string_inner()
* complex_from_string_inner()
* pickle load_float()
* parser parsenumber_raw()
* marshal r_float_str
dtoa.c _Py_dg_dtoa() is called by:
* float.__round__()
* PyOS_double_to_string()
PyOS_double_to_string() is called by:
* float_repr()
* complex_repr()
* bytes % float: _PyBytes_FormatEx()
* str % float: PyUnicode_Format()
* _PyLong_FormatAdvancedWriter()
* _PyComplex_FormatAdvancedWriter()
* pickle save_float()
* marshal w_float_str
I guess that the most important use case are float(str) and str(float). I wrote attached bench_dtoa.py to measure the effect on performance of the commit 5bd1059184b154d339f1bd53d23c98b5bcf14c8c:
---
$ python3 -m pyperf compare_to before.json after.json
float('0'): Mean +- std dev: [before] 80.5 ns +- 3.1 ns -> [after] 90.1 ns +- 3.6 ns: 1.12x slower
float('1.0'): Mean +- std dev: [before] 89.5 ns +- 4.3 ns -> [after] 97.2 ns +- 2.6 ns: 1.09x slower
float('340282366920938463463374607431768211455'): Mean +- std dev: [before] 480 ns +- 42 ns -> [after] 514 ns +- 13 ns: 1.07x slower
float('1044388881413152506691752710716624382579964249047383780384233483283953907971557456848826811934997558340890106714439262837987573438185793607263236087851365277945956976543709998340361590134383718314428070011855946226376318839397712745672334684344586617496807908705803704071284048740118609114467977783598029006686938976881787785946905630190260940599579453432823469303026696443059025015972399867714215541693835559885291486318237914434496734087811872639496475100189041349008417061675093668333850551032972088269550769983616369411933015213796825837188091833656751221318492846368125550225998300412344784862595674492194617023806505913245610825731835380087608622102834270197698202313169017678006675195485079921636419370285375124784014907159135459982790513399611551794271106831134090584272884279791554849782954323534517065223269061394905987693002122963395687782878948440616007412945674919823050571642377154816321380631045902916136926708342856440730447899971901781465763473223850267253059899795996090799469201774624817718449867455659250178329070473119433165550807568221846571746373296884912819520317457002440926616910874148385078411929804522981857338977648103126085903001302413467189726673216491511131602920781738033436090243804708340403154190335'): Mean +- std dev: [before] 717 ns +- 36 ns -> [after] 990 ns +- 27 ns: 1.38x slower
str(0.0): Mean +- std dev: [before] 113 ns +- 8 ns -> [after] 106 ns +- 4 ns: 1.06x faster
str(1.0): Mean +- std dev: [before] 141 ns +- 11 ns -> [after] 135 ns +- 17 ns: 1.05x faster
str(inf): Mean +- std dev: [before] 110 ns +- 11 ns -> [after] 98.9 ns +- 3.3 ns: 1.12x faster
Benchmark hidden because not significant (1): str(3.402823669209385e+38)
Geometric mean: 1.05x slower
---
I built Python with "./configure --enable-optimizations --with-lto" on Fedora 33 (GCC 10.2.1). I didn't use CPU isolation.
Oh, float(str) is between 1.09x slower and 1.38x slower.
On the other side, str(float) is between 1.06x and 1.12x faster, I'm not sure why. I guess that the problem is that PGO+LTO build is not reproducible, GCC might prefer to optimize some functions or others depending on the PROFILE_TASK (Makefile.pre.in, command used by GCC profiler).
Mark Dickinson: "It's feeling as though the normal Python development process is being bypassed here. As I understand it, this and similar changes are in aid of per-subinterpreter GILs. Has there been agreement from the core devs or steering council that this is a desirable goal? Should there be a PEP before more changes like this are made? (Or maybe there's already a PEP, that I missed? I know about PEP 554, but that PEP is explicit that GIL sharing is out of scope.)"
Honestly, I didn't expect any significant impact on performance on the change. So I merged the PR as I merge other fixes for subinterpreters. It seems like I underestimated the number of Balloc/Bmalloc calls per float(str) or str(float) call.
There is no PEP about running multiple Python interpreters in the same process. There is no consensus on this topic. I discussed it in private with some core devs, but that's not relevant here.
My plan is to merge changes which have no significant impact on performances, and wait for a PEP for changes which have a significant impact on performances. Most changes fix bugs in subinterpreters which still share a GIL. This use case is not new and is supported for 10-20 years.
For now, I will the commit 5bd1059184b154d339f1bd53d23c98b5bcf14c8c.
|
msg389305 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-03-22 10:59 |
New changeset 39f643614d03748a5fad462fe7ed26a174a522fa by Victor Stinner in branch 'master':
Revert "bpo-40521: Make dtoa bigint free list per-interpreter (GH-24821)" (GH-24964)
https://github.com/python/cpython/commit/39f643614d03748a5fad462fe7ed26a174a522fa
|
msg389525 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2021-03-25 20:32 |
New changeset 3bb19873abd572879cc9a8810b1db9db1f704070 by Raymond Hettinger in branch 'master':
Revert "bpo-40521: Remove freelist from collections.deque() (GH-21073)" (GH-24944)
https://github.com/python/cpython/commit/3bb19873abd572879cc9a8810b1db9db1f704070
|
msg389527 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-03-25 20:51 |
I reopen the issue to remind me that collections.deque() freelist is shared by all interpreters.
|
msg393787 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-05-17 06:59 |
> I reopen the issue to remind me that collections.deque() freelist is shared by all interpreters.
Each deque instance now has its own free list.
But dtoa.c still has a per-process cache, shared by all interpreters.
|
msg395860 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2021-06-15 04:58 |
[Victor Stinner]
> My plan is to merge changes which have no significant
> impact on performances
FWIW, PyFloat_FromDouble() is the most performance critical function in floatobject.c.
|
msg409819 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2022-01-06 07:53 |
New changeset 35d6540c904ef07b8602ff014e520603f84b5886 by Victor Stinner in branch 'main':
bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" (GH-30422)
https://github.com/python/cpython/commit/35d6540c904ef07b8602ff014e520603f84b5886
|
msg409856 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2022-01-06 15:12 |
New changeset 72c260cf0c71eb01eb13100b751e9d5007d00b70 by Victor Stinner in branch '3.10':
[3.10] bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" (GH-30422) (GH-30425)
https://github.com/python/cpython/commit/72c260cf0c71eb01eb13100b751e9d5007d00b70
|
msg409862 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2022-01-06 15:24 |
My commit ea251806b8dffff11b30d2182af1e589caf88acf (interned strings) introduced bpo-46006 "[subinterpreter] _PyUnicode_EqualToASCIIId() issue with subinterpreters" regression.
To unblock the Python 3.11.0a4 release, I just reverted the change. It reintroduces the issue, so I created bpo-46283: "[subinterpreters] Unicode interned strings must not be shared between interpreters".
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:30 | admin | set | github: 84701 |
2022-01-06 15:24:07 | vstinner | set | messages:
+ msg409862 |
2022-01-06 15:12:36 | vstinner | set | messages:
+ msg409856 |
2022-01-06 14:30:07 | vstinner | set | pull_requests:
+ pull_request28640 |
2022-01-06 07:59:48 | vstinner | set | pull_requests:
+ pull_request28631 |
2022-01-06 07:53:52 | vstinner | set | messages:
+ msg409819 |
2022-01-05 16:26:42 | vstinner | set | pull_requests:
+ pull_request28626 |
2021-06-15 04:58:25 | rhettinger | set | messages:
+ msg395860 |
2021-05-17 06:59:17 | vstinner | set | messages:
+ msg393787 |
2021-05-04 22:25:11 | rhettinger | set | stage: resolved -> patch review pull_requests:
+ pull_request24574 |
2021-03-25 20:51:19 | vstinner | set | status: closed -> open resolution: fixed -> messages:
+ msg389527
|
2021-03-25 20:32:32 | rhettinger | set | messages:
+ msg389525 |
2021-03-22 10:59:05 | vstinner | set | messages:
+ msg389305 |
2021-03-22 10:03:06 | vstinner | set | pull_requests:
+ pull_request23723 |
2021-03-22 10:02:15 | vstinner | set | files:
+ bench_dtoa.py
messages:
+ msg389294 |
2021-03-21 11:59:21 | mark.dickinson | set | messages:
+ msg389226 |
2021-03-20 15:15:25 | rhettinger | set | pull_requests:
+ pull_request23703 |
2021-03-17 20:27:43 | mark.dickinson | set | nosy:
+ mark.dickinson
|
2021-03-13 13:25:31 | vstinner | set | messages:
+ msg388617 |
2021-03-11 09:46:00 | JunyiXie | set | messages:
+ msg388493 |
2021-03-11 09:44:53 | JunyiXie | set | nosy:
+ JunyiXie
messages:
+ msg388492 pull_requests:
+ pull_request23587 |
2021-01-29 22:00:28 | vstinner | set | status: open -> closed resolution: fixed messages:
+ msg385950
|
2020-12-26 22:09:06 | vstinner | set | status: closed -> open files:
+ interned_bug.py resolution: fixed -> (no value) messages:
+ msg383829
|
2020-12-26 02:00:18 | vstinner | set | status: open -> closed versions:
+ Python 3.10, - Python 3.9 messages:
+ msg383790
resolution: fixed stage: patch review -> resolved |
2020-12-26 01:58:40 | vstinner | set | messages:
+ msg383789 |
2020-09-23 12:05:36 | vstinner | set | messages:
+ msg377368 |
2020-09-23 10:55:08 | vstinner | set | pull_requests:
+ pull_request21415 |
2020-07-01 21:21:43 | vstinner | set | messages:
+ msg372795 |
2020-07-01 17:31:50 | vstinner | set | pull_requests:
+ pull_request20413 |
2020-06-25 12:07:46 | vstinner | set | messages:
+ msg372357 |
2020-06-25 11:15:32 | vstinner | set | pull_requests:
+ pull_request20301 |
2020-06-24 13:22:05 | vstinner | set | messages:
+ msg372250 |
2020-06-24 12:56:03 | vstinner | set | pull_requests:
+ pull_request20279 |
2020-06-24 01:21:22 | vstinner | set | messages:
+ msg372223 |
2020-06-24 00:34:38 | vstinner | set | pull_requests:
+ pull_request20271 |
2020-06-24 00:22:28 | vstinner | set | messages:
+ msg372220 |
2020-06-23 22:45:54 | vstinner | set | pull_requests:
+ pull_request20270 |
2020-06-23 22:34:14 | vstinner | set | messages:
+ msg372216 |
2020-06-23 22:14:05 | vstinner | set | pull_requests:
+ pull_request20268 |
2020-06-23 22:10:47 | vstinner | set | messages:
+ msg372209 |
2020-06-23 21:48:46 | vstinner | set | pull_requests:
+ pull_request20263 |
2020-06-23 20:55:53 | vstinner | set | messages:
+ msg372207 |
2020-06-23 15:43:02 | vstinner | set | messages:
+ msg372181 |
2020-06-23 15:11:50 | vstinner | set | pull_requests:
+ pull_request20252 |
2020-06-23 15:02:57 | rhettinger | set | pull_requests:
+ pull_request20251 |
2020-06-23 14:40:49 | vstinner | set | messages:
+ msg372176 |
2020-06-23 13:57:08 | vstinner | set | pull_requests:
+ pull_request20248 |
2020-06-23 13:54:44 | vstinner | set | messages:
+ msg372169 |
2020-06-23 13:50:23 | rhettinger | set | messages:
+ msg372168 |
2020-06-23 13:14:11 | vstinner | set | pull_requests:
+ pull_request20242 |
2020-06-23 13:12:37 | rhettinger | set | nosy:
+ rhettinger pull_requests:
+ pull_request20241
|
2020-06-23 12:08:00 | vstinner | set | messages:
+ msg372161 |
2020-06-23 10:41:07 | vstinner | set | pull_requests:
+ pull_request20237 |
2020-06-23 09:38:05 | vstinner | set | messages:
+ msg372148 |
2020-06-23 09:33:34 | vstinner | set | messages:
+ msg372146 |
2020-06-08 09:24:46 | Mark.Shannon | set | messages:
+ msg370969 |
2020-06-07 23:38:21 | vstinner | set | messages:
+ msg370928 |
2020-06-07 03:54:31 | shihai1991 | set | nosy:
+ shihai1991
|
2020-06-05 17:32:33 | vstinner | set | messages:
+ msg370771 |
2020-06-05 09:59:07 | vstinner | set | messages:
+ msg370757 |
2020-06-05 09:57:23 | Mark.Shannon | set | nosy:
+ Mark.Shannon messages:
+ msg370756
|
2020-06-05 09:50:14 | vstinner | set | files:
+ bench_dict.patch
messages:
+ msg370755 |
2020-06-05 09:47:03 | vstinner | set | messages:
+ msg370754 |
2020-06-05 01:02:15 | vstinner | set | pull_requests:
+ pull_request19865 |
2020-06-05 00:56:43 | vstinner | set | messages:
+ msg370742 |
2020-06-05 00:36:00 | vstinner | set | pull_requests:
+ pull_request19864 |
2020-06-05 00:34:22 | vstinner | set | messages:
+ msg370741 |
2020-06-05 00:10:02 | vstinner | set | pull_requests:
+ pull_request19863 |
2020-06-05 00:05:49 | vstinner | set | messages:
+ msg370740 |
2020-06-04 23:44:42 | vstinner | set | pull_requests:
+ pull_request19862 |
2020-06-04 23:39:28 | vstinner | set | messages:
+ msg370737 |
2020-06-04 23:20:04 | vstinner | set | pull_requests:
+ pull_request19858 |
2020-06-04 23:14:47 | vstinner | set | messages:
+ msg370735 |
2020-06-04 22:53:58 | vstinner | set | pull_requests:
+ pull_request19857 |
2020-06-04 22:50:19 | vstinner | set | messages:
+ msg370734 |
2020-06-04 22:01:04 | vstinner | set | pull_requests:
+ pull_request19856 |
2020-06-04 21:38:44 | vstinner | set | messages:
+ msg370733 |
2020-06-02 23:20:25 | vstinner | set | files:
+ microbench_tuple.py |
2020-06-02 23:20:17 | vstinner | set | files:
+ bench_tuple.patch |
2020-06-02 23:19:48 | vstinner | set | files:
- bench_tuple.patch |
2020-06-02 23:19:47 | vstinner | set | files:
- microbench_tuple.py |
2020-06-02 22:43:43 | vstinner | set | files:
+ bench_tuple.patch |
2020-06-02 22:43:24 | vstinner | set | files:
+ microbench_tuple.py
messages:
+ msg370636 |
2020-05-19 23:57:22 | vstinner | set | messages:
+ msg369407 |
2020-05-19 23:22:10 | vstinner | set | pull_requests:
+ pull_request19535 |
2020-05-19 22:44:42 | vstinner | set | pull_requests:
+ pull_request19534 |
2020-05-15 00:36:01 | vstinner | set | components:
+ Subinterpreters, - Interpreter Core title: Make tuple, dict, frame free lists, unicode interned strings, unicode latin1 singletons per-interpreter -> [subinterpreters] Make free lists and unicode caches per-interpreter |
2020-05-14 11:25:02 | corona10 | set | nosy:
+ corona10
|
2020-05-14 00:56:17 | vstinner | set | pull_requests:
+ pull_request19389 |
2020-05-13 23:48:41 | vstinner | set | messages:
+ msg368808 |
2020-05-13 23:35:08 | vstinner | set | messages:
+ msg368807 |
2020-05-13 23:13:37 | vstinner | set | pull_requests:
+ pull_request19386 |
2020-05-13 23:09:46 | vstinner | set | pull_requests:
+ pull_request19385 |
2020-05-06 17:05:34 | vstinner | set | messages:
+ msg368283 |
2020-05-06 16:24:06 | vstinner | set | messages:
+ msg368278 |
2020-05-06 15:48:33 | vstinner | set | pull_requests:
+ pull_request19276 |
2020-05-06 15:44:12 | vstinner | set | pull_requests:
+ pull_request19275 |
2020-05-05 17:55:33 | vstinner | set | messages:
+ msg368187 |
2020-05-05 16:50:37 | vstinner | set | messages:
+ msg368177 |
2020-05-05 16:48:34 | vstinner | set | pull_requests:
+ pull_request19252 |
2020-05-05 16:11:07 | vstinner | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request19248 |
2020-05-05 15:48:02 | vstinner | create | |