This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [subinterpreters] Make free lists and unicode caches per-interpreter
Type: Stage: patch review
Components: Subinterpreters Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: JunyiXie, Mark.Shannon, corona10, mark.dickinson, rhettinger, shihai1991, vstinner
Priority: normal Keywords: patch

Created on 2020-05-05 15:48 by vstinner, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
bench_tuple.patch vstinner, 2020-06-02 23:20
microbench_tuple.py vstinner, 2020-06-02 23:20
bench_dict.patch vstinner, 2020-06-05 09:50
interned_bug.py vstinner, 2020-12-26 22:09
bench_dtoa.py vstinner, 2021-03-22 10:02
Pull Requests
URL Status Linked Edit
PR 19933 merged vstinner, 2020-05-05 16:11
PR 19937 merged vstinner, 2020-05-05 16:48
PR 19959 merged vstinner, 2020-05-06 15:44
PR 19960 merged vstinner, 2020-05-06 15:48
PR 20081 merged vstinner, 2020-05-13 23:09
PR 20082 closed vstinner, 2020-05-13 23:13
PR 20085 merged vstinner, 2020-05-14 00:56
PR 20246 merged vstinner, 2020-05-19 22:44
PR 20247 merged vstinner, 2020-05-19 23:22
PR 20636 merged vstinner, 2020-06-04 22:01
PR 20637 merged vstinner, 2020-06-04 22:53
PR 20638 merged vstinner, 2020-06-04 23:20
PR 20642 merged vstinner, 2020-06-04 23:44
PR 20643 merged vstinner, 2020-06-05 00:10
PR 20644 merged vstinner, 2020-06-05 00:36
PR 20645 merged vstinner, 2020-06-05 01:02
PR 21068 merged vstinner, 2020-06-23 10:41
PR 21073 merged rhettinger, 2020-06-23 13:12
PR 21074 merged vstinner, 2020-06-23 13:14
PR 21082 merged vstinner, 2020-06-23 13:57
PR 21085 merged rhettinger, 2020-06-23 15:02
PR 21086 merged vstinner, 2020-06-23 15:11
PR 21096 merged vstinner, 2020-06-23 21:48
PR 21099 merged vstinner, 2020-06-23 22:14
PR 21101 merged vstinner, 2020-06-23 22:45
PR 21103 merged vstinner, 2020-06-24 00:34
PR 21116 merged vstinner, 2020-06-24 12:56
PR 21142 merged vstinner, 2020-06-25 11:15
PR 21265 merged vstinner, 2020-07-01 17:31
PR 22376 merged vstinner, 2020-09-23 10:55
PR 24821 merged JunyiXie, 2021-03-11 09:44
PR 24944 merged rhettinger, 2021-03-20 15:15
PR 24964 merged vstinner, 2021-03-22 10:03
PR 25906 merged rhettinger, 2021-05-04 22:25
PR 30422 merged vstinner, 2022-01-05 16:26
PR 30425 merged vstinner, 2022-01-06 07:59
PR 30433 closed vstinner, 2022-01-06 14:30
Messages (56)
msg368175 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-05 15:48
tuple, dict and frame use free lists to optimize the creation of objects.

Unicode uses "interned" strings to reduce the Python memory footprint and speedup dictionary lookups.

Unicode also uses singletons for single letter Latin1 characters ([U+0000; U+00FF] range).

All these optimizations are incompatible with isolated subinterpreters, since caches are currently shared by all inteprepreters. These caches should be made per-intepreter. See bpo-40512 "Meta issue: per-interpreter GIL" for the rationale.

I already made small integer singletons per interpreter in bpo-38858:

* commit 5dcc06f6e0d7b5d6589085692b86c63e35e2325e
* commit 630c8df5cf126594f8c1c4579c1888ca80a29d59.
msg368177 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-05 16:50
New changeset 607b1027fec7b4a1602aab7df57795fbcec1c51b by Victor Stinner in branch 'master':
bpo-40521: Disable Unicode caches in isolated subinterpreters (GH-19933)
https://github.com/python/cpython/commit/607b1027fec7b4a1602aab7df57795fbcec1c51b
msg368187 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-05 17:55
New changeset b4b53868d7d6cd13505321d3802fd00865b25e05 by Victor Stinner in branch 'master':
bpo-40521: Disable free lists in subinterpreters (GH-19937)
https://github.com/python/cpython/commit/b4b53868d7d6cd13505321d3802fd00865b25e05
msg368278 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-06 16:24
New changeset 89fc4a34cf7a01df9dd269d32d3706c68a72d130 by Victor Stinner in branch 'master':
bpo-40521: Disable method cache in subinterpreters (GH-19960)
https://github.com/python/cpython/commit/89fc4a34cf7a01df9dd269d32d3706c68a72d130
msg368283 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-06 17:05
New changeset b7aa23d29fa48238dab3692d02e1f0a7e8a5af9c by Victor Stinner in branch 'master':
bpo-40521: Disable list free list in subinterpreters (GH-19959)
https://github.com/python/cpython/commit/b7aa23d29fa48238dab3692d02e1f0a7e8a5af9c
msg368807 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-13 23:35
I wrote a draft PR to make interned strings per-interpreter. It does crash because it requires to make method cache and _PyUnicode_FromId() (bpo-39465) compatible with subinterpreters.
msg368808 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-13 23:48
New changeset 3d17c045b4c3d09b72bbd95ed78af1ae6f0d98d2 by Victor Stinner in branch 'master':
bpo-40521: Add PyInterpreterState.unicode (GH-20081)
https://github.com/python/cpython/commit/3d17c045b4c3d09b72bbd95ed78af1ae6f0d98d2
msg369407 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-19 23:57
New changeset 0509c4547fc95cc32a91ac446a26192c3bfdf157 by Victor Stinner in branch 'master':
bpo-40521: Fix update_slot() when INTERN_NAME_STRINGS is not defined (#20246)
https://github.com/python/cpython/commit/0509c4547fc95cc32a91ac446a26192c3bfdf157
msg370636 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-02 22:43
Microbenchmark for tuple free list to measure PR 20247 overhead: microbench_tuple.py. It requires to apply bench_tuple.patch.
msg370733 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-04 21:38
New changeset 69ac6e58fd98de339c013fe64cd1cf763e4f9bca by Victor Stinner in branch 'master':
bpo-40521: Make tuple free list per-interpreter (GH-20247)
https://github.com/python/cpython/commit/69ac6e58fd98de339c013fe64cd1cf763e4f9bca
msg370734 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-04 22:50
New changeset 2ba59370c3dda2ac229c14510e53a05074b133d1 by Victor Stinner in branch 'master':
bpo-40521: Make float free list per-interpreter (GH-20636)
https://github.com/python/cpython/commit/2ba59370c3dda2ac229c14510e53a05074b133d1
msg370735 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-04 23:14
New changeset 7daba6f221e713f7f60c613b246459b07d179f91 by Victor Stinner in branch 'master':
bpo-40521: Make slice cache per-interpreter (GH-20637)
https://github.com/python/cpython/commit/7daba6f221e713f7f60c613b246459b07d179f91
msg370737 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-04 23:39
New changeset 3744ed2c9c0b3905947602fc375de49533790cb9 by Victor Stinner in branch 'master':
bpo-40521: Make frame free list per-interpreter (GH-20638)
https://github.com/python/cpython/commit/3744ed2c9c0b3905947602fc375de49533790cb9
msg370740 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-05 00:05
New changeset 88ec9190105c9b03f49aaef601ce02b242a75273 by Victor Stinner in branch 'master':
bpo-40521: Make list free list per-interpreter (GH-20642)
https://github.com/python/cpython/commit/88ec9190105c9b03f49aaef601ce02b242a75273
msg370741 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-05 00:34
New changeset 78a02c2568714562e23e885b6dc5730601f35226 by Victor Stinner in branch 'master':
bpo-40521: Make async gen free lists per-interpreter (GH-20643)
https://github.com/python/cpython/commit/78a02c2568714562e23e885b6dc5730601f35226
msg370742 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-05 00:56
New changeset e005ead49b1ee2b1507ceea94e6f89c28ecf1f81 by Victor Stinner in branch 'master':
bpo-40521: Make context free list per-interpreter (GH-20644)
https://github.com/python/cpython/commit/e005ead49b1ee2b1507ceea94e6f89c28ecf1f81
msg370754 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-05 09:47
> bpo-40521: Make list free list per-interpreter (GH-20642)
> https://github.com/python/cpython/commit/88ec9190105c9b03f49aaef601ce02b242a75273

This change contains an interesting fix:

* _PyGC_Fini() clears gcstate->garbage list which can be stored in
  the list free list. Call _PyGC_Fini() before _PyList_Fini() to
  prevent leaking this list.

Maybe "Fini" functions should disable free lists to prevent following code to add something to a free list, during Python finalization.
msg370755 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-05 09:50
bench_dict.patch: Microbenchmark on the C function PyDict_New() to measure the overhead of PR 20645.
msg370756 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2020-06-05 09:57
I'm worried about the performance impact of these changes, especially as many of the changes haven't been reviewed.

Have you done any performance analysis or tests of the cumulative effect of all these changes?
msg370757 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-05 09:59
> Have you done any performance analysis or tests of the cumulative effect of all these changes?

No. It would be interesting to measure that using pyperformance.
msg370771 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-05 17:32
pyperformance comparaison between:

* commit dc24b8a2ac32114313bae519db3ccc21fe45c982 (before "Make tuple free list per-interpreter" change)
* PR 20645 (dict free lists) which cumulates all free lists changes (already commited + the PR)

Extract of the tested patch, new PyInterpreterState members:
--------------------
diff --git a/Include/internal/pycore_interp.h b/Include/internal/pycore_interp.h
index f04ea330d0..b1a25e0ed4 100644
--- a/Include/internal/pycore_interp.h
+++ b/Include/internal/pycore_interp.h
(...)
@@ -157,6 +233,18 @@ struct _is {
     */
     PyLongObject* small_ints[_PY_NSMALLNEGINTS + _PY_NSMALLPOSINTS];
 #endif
+    struct _Py_unicode_state unicode;
+    struct _Py_float_state float_state;
+    /* Using a cache is very effective since typically only a single slice is
+       created and then deleted again. */
+    PySliceObject *slice_cache;
+
+    struct _Py_tuple_state tuple;
+    struct _Py_list_state list;
+    struct _Py_dict_state dict_state;
+    struct _Py_frame_state frame;
+    struct _Py_async_gen_state async_gen;
+    struct _Py_context_state context;
 };
--------------------

Results:
--------------------
$ python3 -m pyperf compare_to 2020-06-04_20-10-master-dc24b8a2ac32.json.gz 2020-06-04_20-10-master-dc24b8a2ac32-patch-free_lists.json.gz -G 
Slower (10):
- chameleon: 20.1 ms +- 0.4 ms -> 23.1 ms +- 4.0 ms: 1.15x slower (+15%)
- logging_silent: 334 ns +- 51 ns -> 371 ns +- 70 ns: 1.11x slower (+11%)
- spectral_norm: 274 ms +- 37 ms -> 302 ms +- 55 ms: 1.10x slower (+10%)
- logging_format: 22.5 us +- 0.4 us -> 24.5 us +- 2.7 us: 1.09x slower (+9%)
- json_dumps: 26.6 ms +- 4.0 ms -> 28.7 ms +- 5.5 ms: 1.08x slower (+8%)
- sympy_sum: 390 ms +- 3 ms -> 415 ms +- 45 ms: 1.06x slower (+6%)
- float: 217 ms +- 3 ms -> 231 ms +- 30 ms: 1.06x slower (+6%)
- pidigits: 306 ms +- 32 ms -> 323 ms +- 47 ms: 1.06x slower (+6%)
- python_startup_no_site: 8.71 ms +- 0.77 ms -> 8.94 ms +- 0.91 ms: 1.03x slower (+3%)
- xml_etree_process: 130 ms +- 1 ms -> 133 ms +- 2 ms: 1.02x slower (+2%)

Faster (9):
- pickle_pure_python: 1.05 ms +- 0.16 ms -> 964 us +- 19 us: 1.09x faster (-9%)
- scimark_sparse_mat_mult: 11.4 ms +- 2.1 ms -> 10.5 ms +- 1.7 ms: 1.09x faster (-8%)
- hexiom: 19.5 ms +- 4.1 ms -> 18.0 ms +- 3.0 ms: 1.08x faster (-7%)
- telco: 15.7 ms +- 3.1 ms -> 14.5 ms +- 0.4 ms: 1.08x faster (-7%)
- unpickle: 31.8 us +- 5.7 us -> 29.5 us +- 4.9 us: 1.08x faster (-7%)
- scimark_lu: 292 ms +- 60 ms -> 274 ms +- 34 ms: 1.07x faster (-6%)
- django_template: 123 ms +- 16 ms -> 119 ms +- 2 ms: 1.04x faster (-3%)
- xml_etree_generate: 160 ms +- 4 ms -> 156 ms +- 3 ms: 1.02x faster (-2%)
- xml_etree_iterparse: 178 ms +- 3 ms -> 177 ms +- 2 ms: 1.01x faster (-1%)

Benchmark hidden because not significant (41): (...)
--------------------

If we ignore differences smaller than 5%:
--------------------
$ python3 -m pyperf compare_to 2020-06-04_20-10-master-dc24b8a2ac32.json.gz 2020-06-04_20-10-master-dc24b8a2ac32-patch-free_lists.json.gz -G --min-speed=5
Slower (8):
- chameleon: 20.1 ms +- 0.4 ms -> 23.1 ms +- 4.0 ms: 1.15x slower (+15%)
- logging_silent: 334 ns +- 51 ns -> 371 ns +- 70 ns: 1.11x slower (+11%)
- spectral_norm: 274 ms +- 37 ms -> 302 ms +- 55 ms: 1.10x slower (+10%)
- logging_format: 22.5 us +- 0.4 us -> 24.5 us +- 2.7 us: 1.09x slower (+9%)
- json_dumps: 26.6 ms +- 4.0 ms -> 28.7 ms +- 5.5 ms: 1.08x slower (+8%)
- sympy_sum: 390 ms +- 3 ms -> 415 ms +- 45 ms: 1.06x slower (+6%)
- float: 217 ms +- 3 ms -> 231 ms +- 30 ms: 1.06x slower (+6%)
- pidigits: 306 ms +- 32 ms -> 323 ms +- 47 ms: 1.06x slower (+6%)

Faster (6):
- pickle_pure_python: 1.05 ms +- 0.16 ms -> 964 us +- 19 us: 1.09x faster (-9%)
- scimark_sparse_mat_mult: 11.4 ms +- 2.1 ms -> 10.5 ms +- 1.7 ms: 1.09x faster (-8%)
- hexiom: 19.5 ms +- 4.1 ms -> 18.0 ms +- 3.0 ms: 1.08x faster (-7%)
- telco: 15.7 ms +- 3.1 ms -> 14.5 ms +- 0.4 ms: 1.08x faster (-7%)
- unpickle: 31.8 us +- 5.7 us -> 29.5 us +- 4.9 us: 1.08x faster (-7%)
- scimark_lu: 292 ms +- 60 ms -> 274 ms +- 34 ms: 1.07x faster (-6%)

Benchmark hidden because not significant (46): (...)
--------------------

Honestly, I'm surprised by these results. I don't see how these free lists change can make between 6 and 9 benchamrks faster (ex: 1.08x faster for telco!?). For me, it sounds like speed.python.org runner has some troubles. You can notice it if you look at the 3 last runs at https://speed.python.org/ : they are some spikes (in both directions, faster or slower) which are very surprising.

Pablo recently upgrade Ubuntu on the benchmark runner server. I don't know if it's related.

I plan to recompute all benchmarks run on the benchmark runner server since over the last years, pyperf and pyperformance were upgraded multiple times (old data were computed with old versions) and the system (Ubuntu) was upgraded (again, old data were computed with older Ubiuntu packages).
msg370928 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-07 23:38
See also bpo-40887: "Free lists are still used after being finalized (cleared)".
msg370969 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2020-06-08 09:24
I'd be interested to see if you can get more consistent results.

Performance of modern hardware is very sensitive to memory layout, so some sort of address randomization might be needed to remove artifacts of layout.
It is possible that the objects on the free lists for telco are better aligned with cache lines, or fit is cache better in some way.
And conversely, in chameleon, objects fit cache in a worse way.
Just a guess, of course.

Thanks for trying to get some benchmark results.
msg372146 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-23 09:33
New changeset b4e85cadfbc2b1b24ec5f3159e351dbacedaa5e0 by Victor Stinner in branch 'master':
bpo-40521: Make dict free lists per-interpreter (GH-20645)
https://github.com/python/cpython/commit/b4e85cadfbc2b1b24ec5f3159e351dbacedaa5e0
msg372148 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-23 09:38
All free lists are now per-interpreter! See Modules/gcmodule.c:

static void
clear_freelists(PyThreadState *tstate)
{
    _PyFrame_ClearFreeList(tstate);
    _PyTuple_ClearFreeList(tstate);
    _PyFloat_ClearFreeList(tstate);
    _PyList_ClearFreeList(tstate);
    _PyDict_ClearFreeList(tstate);
    _PyAsyncGen_ClearFreeLists(tstate);
    _PyContext_ClearFreeList(tstate);
}

I'm still working on the Unicode caches.
msg372161 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-23 12:08
New changeset 261cfedf7657a515e04428bba58eba2a9bb88208 by Victor Stinner in branch 'master':
bpo-40521: Make the empty frozenset per interpreter (GH-21068)
https://github.com/python/cpython/commit/261cfedf7657a515e04428bba58eba2a9bb88208
msg372168 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-06-23 13:50
New changeset 32f2eda85957365d208f499b730d30b7eb419741 by Raymond Hettinger in branch 'master':
bpo-40521: Remove freelist from collections.deque() (GH-21073)
https://github.com/python/cpython/commit/32f2eda85957365d208f499b730d30b7eb419741
msg372169 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-23 13:54
New changeset c41eed1a874e2f22bde45c3c89418414b7a37f46 by Victor Stinner in branch 'master':
bpo-40521: Make bytes singletons per interpreter (GH-21074)
https://github.com/python/cpython/commit/c41eed1a874e2f22bde45c3c89418414b7a37f46
msg372176 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-23 14:40
New changeset 522691c46e2ae51faaad5bbbce7d959dd61770df by Victor Stinner in branch 'master':
bpo-40521: Cleanup code of free lists (GH-21082)
https://github.com/python/cpython/commit/522691c46e2ae51faaad5bbbce7d959dd61770df
msg372181 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-23 15:43
New changeset f9bd05e83e32bece49de5af0c9a232325c57648a by Raymond Hettinger in branch 'master':
bpo-40521: Empty frozenset is no longer a singleton (GH-21085)
https://github.com/python/cpython/commit/f9bd05e83e32bece49de5af0c9a232325c57648a
msg372207 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-23 20:55
New changeset 281cce1106568ef9fec17e3c72d289416fac02a5 by Victor Stinner in branch 'master':
bpo-40521: Make MemoryError free list per interpreter (GH-21086)
https://github.com/python/cpython/commit/281cce1106568ef9fec17e3c72d289416fac02a5
msg372209 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-23 22:10
New changeset f363d0a6e9cfa50677a6de203735fbc0d06c2f49 by Victor Stinner in branch 'master':
bpo-40521: Make empty Unicode string per interpreter (GH-21096)
https://github.com/python/cpython/commit/f363d0a6e9cfa50677a6de203735fbc0d06c2f49
msg372216 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-23 22:34
New changeset 90ed8a6d71b2d6e0853c14e8e6f85fe730a4329a by Victor Stinner in branch 'master':
bpo-40521: Optimize PyUnicode_New(0, maxchar) (GH-21099)
https://github.com/python/cpython/commit/90ed8a6d71b2d6e0853c14e8e6f85fe730a4329a
msg372220 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-24 00:22
New changeset 2f9ada96e0d420fed0d09a032b37197f08ef167a by Victor Stinner in branch 'master':
bpo-40521: Make Unicode latin1 singletons per interpreter (GH-21101)
https://github.com/python/cpython/commit/2f9ada96e0d420fed0d09a032b37197f08ef167a
msg372223 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-24 01:21
New changeset cde283d16d87024f455e45c6f1b4e4f7d8905836 by Victor Stinner in branch 'master':
bpo-40521: Fix _PyContext_Fini() (GH-21103)
https://github.com/python/cpython/commit/cde283d16d87024f455e45c6f1b4e4f7d8905836
msg372250 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-24 13:22
New changeset 0430dfac629b4eb0e899a09b899a494aa92145f6 by Victor Stinner in branch 'master':
bpo-40521: Always create the empty tuple singleton (GH-21116)
https://github.com/python/cpython/commit/0430dfac629b4eb0e899a09b899a494aa92145f6
msg372357 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-25 12:07
New changeset 91698d8caa4b5bb6e8dbb64b156e8afe9e32cac1 by Victor Stinner in branch 'master':
bpo-40521: Optimize PyBytes_FromStringAndSize(str, 0) (GH-21142)
https://github.com/python/cpython/commit/91698d8caa4b5bb6e8dbb64b156e8afe9e32cac1
msg372795 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-07-01 21:21
New changeset 90db4653ae37ef90754cfd2cd6ec6857b87a88e6 by Victor Stinner in branch 'master':
bpo-40521: Cleanup finalize_interp_types() (GH-21265)
https://github.com/python/cpython/commit/90db4653ae37ef90754cfd2cd6ec6857b87a88e6
msg377368 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-23 12:05
New changeset 7f413a5d95e6d7ddddd6e2c9844c33594d6288f4 by Victor Stinner in branch 'master':
bpo-40521: Fix PyUnicode_InternInPlace() (GH-22376)
https://github.com/python/cpython/commit/7f413a5d95e6d7ddddd6e2c9844c33594d6288f4
msg383789 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-26 01:58
New changeset ea251806b8dffff11b30d2182af1e589caf88acf by Victor Stinner in branch 'master':
bpo-40521: Per-interpreter interned strings (GH-20085)
https://github.com/python/cpython/commit/ea251806b8dffff11b30d2182af1e589caf88acf
msg383790 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-26 02:00
> bpo-40521: Per-interpreter interned strings (GH-20085)

That one wasn't easy, but it's now done! I close the issue.
msg383829 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-26 22:09
> New changeset ea251806b8dffff11b30d2182af1e589caf88acf by Victor Stinner in branch 'master':
> bpo-40521: Per-interpreter interned strings (GH-20085)

I reopen the issue. This change caused a regression in attached interned_bug.py. Output:
---
$ ./python interned_bug.py 
Exception ignored deletion of interned string failed:
KeyError: 'out of memory'
python: Objects/unicodeobject.c:1946: unicode_dealloc: Assertion `Py_REFCNT(unicode) == 1' failed.
Abandon (core dumped)
---

Running "import xml.parsers.expat" in a subinterpreter causes two issues when the subinterpreter completes:

* pyexpat.errors and pyexpat.model dictionaries are cleared: all values set to None
* unicode_dealloc() logs an error on an interned string in the subinterpreter, because the string doesn't exist in the subinterpreter interned dictionary.

The interned string is created in the main interpreter and so stored in the main interpreter interned dictionary.

The string is stored in 2 dictionaries of pyexpat.errors dictionaries:

>>> pyexpat.errors.messages[1]
'out of memory'
>>> pyexpat.errors.codes['out of memory']
1

When the subinterpreter clears pyexpat.errors and pyexpat.model dictionaries, the interned string is deleted: unicode_dealloc() is called. But unicode_dealloc() fails to delete the interned string in the subinterpreter interned dictionary.

pyexpat.errors and pyexpat.model modules are cleared because they are stored as different names in sys.modules by Lib/xml/parsers/expat.py:

sys.modules['xml.parsers.expat.model'] = model
sys.modules['xml.parsers.expat.errors'] = errors
msg385950 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-29 22:00
> I reopen the issue. This change caused a regression in attached interned_bug.py.

Fixed by:

commit c8a87addb1fa35dec79ed8f227eba3694fc36234
Author: Mohamed Koubaa <koubaa.m@gmail.com>
Date:   Mon Jan 4 08:34:26 2021 -0600

    bpo-1635741: Port pyexpat to multi-phase init (PEP 489) (GH-22222)
msg388492 - (view) Author: junyixie (JunyiXie) * Date: 2021-03-11 09:44
Should Make dtoa bigint free list per-interpreter.

static Bigint *bigint_freelist[Kmax+1]; -> _is { Bigint *bigint_freelist[Kmax+1]; }
msg388493 - (view) Author: junyixie (JunyiXie) * Date: 2021-03-11 09:46
https://github.com/python/cpython/pull/24821/commits/9d7681dbd273b5025fd9b19d1be0a1f978a0b12e
msg388617 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-13 13:25
New changeset 5bd1059184b154d339f1bd53d23c98b5bcf14c8c by junyixie in branch 'master':
bpo-40521: Make dtoa bigint free list per-interpreter (GH-24821)
https://github.com/python/cpython/commit/5bd1059184b154d339f1bd53d23c98b5bcf14c8c
msg389226 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-03-21 11:59
Hi Victor,

I just noticed the change to dtoa.c in GH-24821. Please could you explain what the benefit of this change was?

In general, we need to be very conservative with changes to dtoa.c: it's a complex, fragile, performance-critical piece of code, and ideally we'd like it not to diverge from the upstream code any more than it already has, in case we need to integrate bugfixes from upstream.

It's feeling as though the normal Python development process is being bypassed here. As I understand it, this and similar changes are in aid of per-subinterpreter GILs. Has there been agreement from the core devs or steering council that this is a desirable goal? Should there be a PEP before more changes like this are made? (Or maybe there's already a PEP, that I missed? I know about PEP 554, but that PEP is explicit that GIL sharing is out of scope.)
msg389294 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-22 10:02
Mark Dickinson: "I just noticed the change to dtoa.c in GH-24821. Please could you explain what the benefit of this change was?"

The rationale is explained in bpo-40512. The goal is to run multiple Python interpreters in parallel in the same process.

dtoa.c had global variables shared by all interpreters without locking, so two intepreters could corrupt the freelist consistency.


Mark Dickinson: "In general, we need to be very conservative with changes to dtoa.c: it's a complex, fragile, performance-critical piece of code, and ideally we'd like it not to diverge from the upstream code any more than it already has, in case we need to integrate bugfixes from upstream."

I know that dtoa.c was copied from a third party project. But the commit 5bd1059184b154d339f1bd53d23c98b5bcf14c8c change change only makes sense in Python, I don't think that it would make sense to propose it upstream.

dtoa.c _Py_dg_strtod() is called by:

* float.__round__()
* _PyOS_ascii_strtod()

_PyOS_ascii_strtod() is called PyOS_string_to_double() which is called by:

* float_from_string_inner()
* complex_from_string_inner()
* pickle load_float()
* parser parsenumber_raw()
* marshal r_float_str

dtoa.c _Py_dg_dtoa() is called by:

* float.__round__()
* PyOS_double_to_string()

PyOS_double_to_string() is called by:

* float_repr()
* complex_repr()
* bytes % float: _PyBytes_FormatEx()
* str % float: PyUnicode_Format()
* _PyLong_FormatAdvancedWriter()
* _PyComplex_FormatAdvancedWriter()
* pickle save_float()
* marshal w_float_str


I guess that the most important use case are float(str) and str(float). I wrote attached bench_dtoa.py to measure the effect on performance of the commit 5bd1059184b154d339f1bd53d23c98b5bcf14c8c:
---
$ python3 -m pyperf compare_to before.json after.json 
float('0'): Mean +- std dev: [before] 80.5 ns +- 3.1 ns -> [after] 90.1 ns +- 3.6 ns: 1.12x slower
float('1.0'): Mean +- std dev: [before] 89.5 ns +- 4.3 ns -> [after] 97.2 ns +- 2.6 ns: 1.09x slower
float('340282366920938463463374607431768211455'): Mean +- std dev: [before] 480 ns +- 42 ns -> [after] 514 ns +- 13 ns: 1.07x slower
float('1044388881413152506691752710716624382579964249047383780384233483283953907971557456848826811934997558340890106714439262837987573438185793607263236087851365277945956976543709998340361590134383718314428070011855946226376318839397712745672334684344586617496807908705803704071284048740118609114467977783598029006686938976881787785946905630190260940599579453432823469303026696443059025015972399867714215541693835559885291486318237914434496734087811872639496475100189041349008417061675093668333850551032972088269550769983616369411933015213796825837188091833656751221318492846368125550225998300412344784862595674492194617023806505913245610825731835380087608622102834270197698202313169017678006675195485079921636419370285375124784014907159135459982790513399611551794271106831134090584272884279791554849782954323534517065223269061394905987693002122963395687782878948440616007412945674919823050571642377154816321380631045902916136926708342856440730447899971901781465763473223850267253059899795996090799469201774624817718449867455659250178329070473119433165550807568221846571746373296884912819520317457002440926616910874148385078411929804522981857338977648103126085903001302413467189726673216491511131602920781738033436090243804708340403154190335'): Mean +- std dev: [before] 717 ns +- 36 ns -> [after] 990 ns +- 27 ns: 1.38x slower
str(0.0): Mean +- std dev: [before] 113 ns +- 8 ns -> [after] 106 ns +- 4 ns: 1.06x faster
str(1.0): Mean +- std dev: [before] 141 ns +- 11 ns -> [after] 135 ns +- 17 ns: 1.05x faster
str(inf): Mean +- std dev: [before] 110 ns +- 11 ns -> [after] 98.9 ns +- 3.3 ns: 1.12x faster

Benchmark hidden because not significant (1): str(3.402823669209385e+38)

Geometric mean: 1.05x slower
---

I built Python with "./configure --enable-optimizations --with-lto" on Fedora 33 (GCC 10.2.1). I didn't use CPU isolation.

Oh, float(str) is between 1.09x slower and 1.38x slower.

On the other side, str(float) is between 1.06x and 1.12x faster, I'm not sure why. I guess that the problem is that PGO+LTO build is not reproducible, GCC might prefer to optimize some functions or others depending on the PROFILE_TASK (Makefile.pre.in, command used by GCC profiler).


Mark Dickinson: "It's feeling as though the normal Python development process is being bypassed here. As I understand it, this and similar changes are in aid of per-subinterpreter GILs. Has there been agreement from the core devs or steering council that this is a desirable goal? Should there be a PEP before more changes like this are made? (Or maybe there's already a PEP, that I missed? I know about PEP 554, but that PEP is explicit that GIL sharing is out of scope.)"


Honestly, I didn't expect any significant impact on performance on the change. So I merged the PR as I merge other fixes for subinterpreters. It seems like I underestimated the number of Balloc/Bmalloc calls per float(str) or str(float) call.

There is no PEP about running multiple Python interpreters in the same process. There is no consensus on this topic. I discussed it in private with some core devs, but that's not relevant here.

My plan is to merge changes which have no significant impact on performances, and wait for a PEP for changes which have a significant impact on performances. Most changes fix bugs in subinterpreters which still share a GIL. This use case is not new and is supported for 10-20 years.


For now, I will the commit 5bd1059184b154d339f1bd53d23c98b5bcf14c8c.
msg389305 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-22 10:59
New changeset 39f643614d03748a5fad462fe7ed26a174a522fa by Victor Stinner in branch 'master':
Revert "bpo-40521: Make dtoa bigint free list per-interpreter (GH-24821)" (GH-24964)
https://github.com/python/cpython/commit/39f643614d03748a5fad462fe7ed26a174a522fa
msg389525 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-03-25 20:32
New changeset 3bb19873abd572879cc9a8810b1db9db1f704070 by Raymond Hettinger in branch 'master':
Revert "bpo-40521: Remove freelist from collections.deque() (GH-21073)" (GH-24944)
https://github.com/python/cpython/commit/3bb19873abd572879cc9a8810b1db9db1f704070
msg389527 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-25 20:51
I reopen the issue to remind me that collections.deque() freelist is shared by all interpreters.
msg393787 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-05-17 06:59
> I reopen the issue to remind me that collections.deque() freelist is shared by all interpreters.

Each deque instance now has its own free list.

But dtoa.c still has a per-process cache, shared by all interpreters.
msg395860 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-06-15 04:58
[Victor Stinner]
> My plan is to merge changes which have no significant
> impact on performances

FWIW, PyFloat_FromDouble() is the most performance critical function in floatobject.c.
msg409819 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-01-06 07:53
New changeset 35d6540c904ef07b8602ff014e520603f84b5886 by Victor Stinner in branch 'main':
bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" (GH-30422)
https://github.com/python/cpython/commit/35d6540c904ef07b8602ff014e520603f84b5886
msg409856 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-01-06 15:12
New changeset 72c260cf0c71eb01eb13100b751e9d5007d00b70 by Victor Stinner in branch '3.10':
[3.10] bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" (GH-30422) (GH-30425)
https://github.com/python/cpython/commit/72c260cf0c71eb01eb13100b751e9d5007d00b70
msg409862 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-01-06 15:24
My commit ea251806b8dffff11b30d2182af1e589caf88acf (interned strings) introduced bpo-46006 "[subinterpreter] _PyUnicode_EqualToASCIIId() issue with subinterpreters" regression.

To unblock the Python 3.11.0a4 release, I just reverted the change. It reintroduces the issue, so I created bpo-46283: "[subinterpreters] Unicode interned strings must not be shared between interpreters".
History
Date User Action Args
2022-04-11 14:59:30adminsetgithub: 84701
2022-01-06 15:24:07vstinnersetmessages: + msg409862
2022-01-06 15:12:36vstinnersetmessages: + msg409856
2022-01-06 14:30:07vstinnersetpull_requests: + pull_request28640
2022-01-06 07:59:48vstinnersetpull_requests: + pull_request28631
2022-01-06 07:53:52vstinnersetmessages: + msg409819
2022-01-05 16:26:42vstinnersetpull_requests: + pull_request28626
2021-06-15 04:58:25rhettingersetmessages: + msg395860
2021-05-17 06:59:17vstinnersetmessages: + msg393787
2021-05-04 22:25:11rhettingersetstage: resolved -> patch review
pull_requests: + pull_request24574
2021-03-25 20:51:19vstinnersetstatus: closed -> open
resolution: fixed ->
messages: + msg389527
2021-03-25 20:32:32rhettingersetmessages: + msg389525
2021-03-22 10:59:05vstinnersetmessages: + msg389305
2021-03-22 10:03:06vstinnersetpull_requests: + pull_request23723
2021-03-22 10:02:15vstinnersetfiles: + bench_dtoa.py

messages: + msg389294
2021-03-21 11:59:21mark.dickinsonsetmessages: + msg389226
2021-03-20 15:15:25rhettingersetpull_requests: + pull_request23703
2021-03-17 20:27:43mark.dickinsonsetnosy: + mark.dickinson
2021-03-13 13:25:31vstinnersetmessages: + msg388617
2021-03-11 09:46:00JunyiXiesetmessages: + msg388493
2021-03-11 09:44:53JunyiXiesetnosy: + JunyiXie

messages: + msg388492
pull_requests: + pull_request23587
2021-01-29 22:00:28vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg385950
2020-12-26 22:09:06vstinnersetstatus: closed -> open
files: + interned_bug.py
resolution: fixed -> (no value)
messages: + msg383829
2020-12-26 02:00:18vstinnersetstatus: open -> closed
versions: + Python 3.10, - Python 3.9
messages: + msg383790

resolution: fixed
stage: patch review -> resolved
2020-12-26 01:58:40vstinnersetmessages: + msg383789
2020-09-23 12:05:36vstinnersetmessages: + msg377368
2020-09-23 10:55:08vstinnersetpull_requests: + pull_request21415
2020-07-01 21:21:43vstinnersetmessages: + msg372795
2020-07-01 17:31:50vstinnersetpull_requests: + pull_request20413
2020-06-25 12:07:46vstinnersetmessages: + msg372357
2020-06-25 11:15:32vstinnersetpull_requests: + pull_request20301
2020-06-24 13:22:05vstinnersetmessages: + msg372250
2020-06-24 12:56:03vstinnersetpull_requests: + pull_request20279
2020-06-24 01:21:22vstinnersetmessages: + msg372223
2020-06-24 00:34:38vstinnersetpull_requests: + pull_request20271
2020-06-24 00:22:28vstinnersetmessages: + msg372220
2020-06-23 22:45:54vstinnersetpull_requests: + pull_request20270
2020-06-23 22:34:14vstinnersetmessages: + msg372216
2020-06-23 22:14:05vstinnersetpull_requests: + pull_request20268
2020-06-23 22:10:47vstinnersetmessages: + msg372209
2020-06-23 21:48:46vstinnersetpull_requests: + pull_request20263
2020-06-23 20:55:53vstinnersetmessages: + msg372207
2020-06-23 15:43:02vstinnersetmessages: + msg372181
2020-06-23 15:11:50vstinnersetpull_requests: + pull_request20252
2020-06-23 15:02:57rhettingersetpull_requests: + pull_request20251
2020-06-23 14:40:49vstinnersetmessages: + msg372176
2020-06-23 13:57:08vstinnersetpull_requests: + pull_request20248
2020-06-23 13:54:44vstinnersetmessages: + msg372169
2020-06-23 13:50:23rhettingersetmessages: + msg372168
2020-06-23 13:14:11vstinnersetpull_requests: + pull_request20242
2020-06-23 13:12:37rhettingersetnosy: + rhettinger
pull_requests: + pull_request20241
2020-06-23 12:08:00vstinnersetmessages: + msg372161
2020-06-23 10:41:07vstinnersetpull_requests: + pull_request20237
2020-06-23 09:38:05vstinnersetmessages: + msg372148
2020-06-23 09:33:34vstinnersetmessages: + msg372146
2020-06-08 09:24:46Mark.Shannonsetmessages: + msg370969
2020-06-07 23:38:21vstinnersetmessages: + msg370928
2020-06-07 03:54:31shihai1991setnosy: + shihai1991
2020-06-05 17:32:33vstinnersetmessages: + msg370771
2020-06-05 09:59:07vstinnersetmessages: + msg370757
2020-06-05 09:57:23Mark.Shannonsetnosy: + Mark.Shannon
messages: + msg370756
2020-06-05 09:50:14vstinnersetfiles: + bench_dict.patch

messages: + msg370755
2020-06-05 09:47:03vstinnersetmessages: + msg370754
2020-06-05 01:02:15vstinnersetpull_requests: + pull_request19865
2020-06-05 00:56:43vstinnersetmessages: + msg370742
2020-06-05 00:36:00vstinnersetpull_requests: + pull_request19864
2020-06-05 00:34:22vstinnersetmessages: + msg370741
2020-06-05 00:10:02vstinnersetpull_requests: + pull_request19863
2020-06-05 00:05:49vstinnersetmessages: + msg370740
2020-06-04 23:44:42vstinnersetpull_requests: + pull_request19862
2020-06-04 23:39:28vstinnersetmessages: + msg370737
2020-06-04 23:20:04vstinnersetpull_requests: + pull_request19858
2020-06-04 23:14:47vstinnersetmessages: + msg370735
2020-06-04 22:53:58vstinnersetpull_requests: + pull_request19857
2020-06-04 22:50:19vstinnersetmessages: + msg370734
2020-06-04 22:01:04vstinnersetpull_requests: + pull_request19856
2020-06-04 21:38:44vstinnersetmessages: + msg370733
2020-06-02 23:20:25vstinnersetfiles: + microbench_tuple.py
2020-06-02 23:20:17vstinnersetfiles: + bench_tuple.patch
2020-06-02 23:19:48vstinnersetfiles: - bench_tuple.patch
2020-06-02 23:19:47vstinnersetfiles: - microbench_tuple.py
2020-06-02 22:43:43vstinnersetfiles: + bench_tuple.patch
2020-06-02 22:43:24vstinnersetfiles: + microbench_tuple.py

messages: + msg370636
2020-05-19 23:57:22vstinnersetmessages: + msg369407
2020-05-19 23:22:10vstinnersetpull_requests: + pull_request19535
2020-05-19 22:44:42vstinnersetpull_requests: + pull_request19534
2020-05-15 00:36:01vstinnersetcomponents: + Subinterpreters, - Interpreter Core
title: Make tuple, dict, frame free lists, unicode interned strings, unicode latin1 singletons per-interpreter -> [subinterpreters] Make free lists and unicode caches per-interpreter
2020-05-14 11:25:02corona10setnosy: + corona10
2020-05-14 00:56:17vstinnersetpull_requests: + pull_request19389
2020-05-13 23:48:41vstinnersetmessages: + msg368808
2020-05-13 23:35:08vstinnersetmessages: + msg368807
2020-05-13 23:13:37vstinnersetpull_requests: + pull_request19386
2020-05-13 23:09:46vstinnersetpull_requests: + pull_request19385
2020-05-06 17:05:34vstinnersetmessages: + msg368283
2020-05-06 16:24:06vstinnersetmessages: + msg368278
2020-05-06 15:48:33vstinnersetpull_requests: + pull_request19276
2020-05-06 15:44:12vstinnersetpull_requests: + pull_request19275
2020-05-05 17:55:33vstinnersetmessages: + msg368187
2020-05-05 16:50:37vstinnersetmessages: + msg368177
2020-05-05 16:48:34vstinnersetpull_requests: + pull_request19252
2020-05-05 16:11:07vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request19248
2020-05-05 15:48:02vstinnercreate