[subinterpreters] Make free lists and unicode caches per-interpreter #84701

vstinner · 2020-05-05T15:48:02Z

BPO	40521
Nosy	@rhettinger, @mdickinson, @vstinner, @markshannon, @corona10, @shihai1991, @JunyiXie
PRs	bpo-40521: Disable Unicode caches in isolated subinterpreters #19933 bpo-40521: Disable free lists in subinterpreters #19937 bpo-40521: Disable list free list in subinterpreters #19959 bpo-40521: Disable method cache in subinterpreters #19960 bpo-40521: Add PyInterpreterState.unicode #20081 bpo-40521: Add PyInterpreterState.unicode #20082 bpo-40521: Per-interpreter interned strings #20085 bpo-40521: Fix update_slot() whne INTERN_NAME_STRINGS is not defined #20246 bpo-40521: Make tuple free list per-interpreter #20247 bpo-40521: Make float free list per-interpreter #20636 bpo-40521: Make slice cache per-interpreter #20637 bpo-40521: Make frame free list per-interpreter #20638 bpo-40521: Make list free list per-interpreter #20642 bpo-40521: Make async gen free lists per-interpreter #20643 bpo-40521: Make context free list per-interpreter #20644 bpo-40521: Make dict free lists per-interpreter #20645 bpo-40521: Make the empty frozenset per interpreter #21068 bpo-40521: Remove freelist from collections.deque() #21073 bpo-40521: Make bytes singletons per interpreter #21074 bpo-40521: Cleanup code of free lists #21082 bpo-40521: Empty frozenset is no longer a singleton #21085 bpo-40521: Make MemoryError free list per interpreter #21086 bpo-40521: Make empty Unicode string per interpreter #21096 bpo-40521: Optimize PyUnicode_New(0, maxchar) #21099 bpo-40521: Make Unicode latin1 singletons per interpreter #21101 bpo-40521: Fix _PyContext_Fini() #21103 bpo-40521: Always create the empty tuple singleton #21116 bpo-40521: Optimize PyBytes_FromStringAndSize(str, 0) #21142 bpo-40521: Cleanup finalize_interp_types() #21265 bpo-40521: Fix PyUnicode_InternInPlace() #22376 bpo-40521: [subinterpreters] Make dtoa bigint free list per-interpreter #24821 Revert "bpo-40521: Remove freelist from collections.deque() (GH-21073)" #24944 Revert "bpo-40521: [subinterpreters] Make dtoa bigint free list per-interpreter" #24964 bpo-40521: Convert deque freelist from global vars to instance vars #25906 bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" #30422 [3.10] bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" (GH-30422) #30425 [3.10] bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" (GH-30422) #30433
Files	bench_tuple.patch microbench_tuple.py bench_dict.patch interned_bug.py bench_dtoa.py

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2020-05-05.15:48:02.489>
labels = ['expert-subinterpreters', '3.10']
title = '[subinterpreters] Make free lists and unicode caches per-interpreter'
updated_at = <Date 2022-01-06.15:24:07.868>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2022-01-06.15:24:07.868>
actor = 'vstinner'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Subinterpreters']
creation = <Date 2020-05-05.15:48:02.489>
creator = 'vstinner'
dependencies = []
files = ['49212', '49213', '49216', '49699', '49899']
hgrepos = []
issue_num = 40521
keywords = ['patch']
message_count = 56.0
messages = ['368175', '368177', '368187', '368278', '368283', '368807', '368808', '369407', '370636', '370733', '370734', '370735', '370737', '370740', '370741', '370742', '370754', '370755', '370756', '370757', '370771', '370928', '370969', '372146', '372148', '372161', '372168', '372169', '372176', '372181', '372207', '372209', '372216', '372220', '372223', '372250', '372357', '372795', '377368', '383789', '383790', '383829', '385950', '388492', '388493', '388617', '389226', '389294', '389305', '389525', '389527', '393787', '395860', '409819', '409856', '409862']
nosy_count = 7.0
nosy_names = ['rhettinger', 'mark.dickinson', 'vstinner', 'Mark.Shannon', 'corona10', 'shihai1991', 'JunyiXie']
pr_nums = ['19933', '19937', '19959', '19960', '20081', '20082', '20085', '20246', '20247', '20636', '20637', '20638', '20642', '20643', '20644', '20645', '21068', '21073', '21074', '21082', '21085', '21086', '21096', '21099', '21101', '21103', '21116', '21142', '21265', '22376', '24821', '24944', '24964', '25906', '30422', '30425', '30433']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue40521'
versions = ['Python 3.10']

vstinner · 2020-05-05T15:48:02Z

tuple, dict and frame use free lists to optimize the creation of objects.

Unicode uses "interned" strings to reduce the Python memory footprint and speedup dictionary lookups.

Unicode also uses singletons for single letter Latin1 characters ([U+0000; U+00FF] range).

All these optimizations are incompatible with isolated subinterpreters, since caches are currently shared by all inteprepreters. These caches should be made per-intepreter. See bpo-40512 "Meta issue: per-interpreter GIL" for the rationale.

I already made small integer singletons per interpreter in bpo-38858:

commit 5dcc06f
commit 630c8df.

vstinner · 2020-05-05T16:50:38Z

New changeset 607b102 by Victor Stinner in branch 'master':
bpo-40521: Disable Unicode caches in isolated subinterpreters (GH-19933)
607b102

vstinner · 2020-05-05T17:55:33Z

New changeset b4b5386 by Victor Stinner in branch 'master':
bpo-40521: Disable free lists in subinterpreters (GH-19937)
b4b5386

vstinner · 2020-05-06T16:24:06Z

New changeset 89fc4a3 by Victor Stinner in branch 'master':
bpo-40521: Disable method cache in subinterpreters (GH-19960)
89fc4a3

vstinner · 2020-05-06T17:05:34Z

New changeset b7aa23d by Victor Stinner in branch 'master':
bpo-40521: Disable list free list in subinterpreters (GH-19959)
b7aa23d

vstinner · 2020-05-13T23:35:09Z

I wrote a draft PR to make interned strings per-interpreter. It does crash because it requires to make method cache and _PyUnicode_FromId() (bpo-39465) compatible with subinterpreters.

vstinner · 2020-05-13T23:48:42Z

New changeset 3d17c04 by Victor Stinner in branch 'master':
bpo-40521: Add PyInterpreterState.unicode (GH-20081)
3d17c04

vstinner · 2020-05-19T23:57:22Z

New changeset 0509c45 by Victor Stinner in branch 'master':
bpo-40521: Fix update_slot() when INTERN_NAME_STRINGS is not defined (bpo-20246)
0509c45

vstinner · 2020-06-02T22:43:24Z

Microbenchmark for tuple free list to measure PR 20247 overhead: microbench_tuple.py. It requires to apply bench_tuple.patch.

vstinner · 2020-06-04T21:38:44Z

New changeset 69ac6e5 by Victor Stinner in branch 'master':
bpo-40521: Make tuple free list per-interpreter (GH-20247)
69ac6e5

vstinner · 2020-06-04T22:50:20Z

New changeset 2ba5937 by Victor Stinner in branch 'master':
bpo-40521: Make float free list per-interpreter (GH-20636)
2ba5937

vstinner · 2020-06-04T23:14:47Z

New changeset 7daba6f by Victor Stinner in branch 'master':
bpo-40521: Make slice cache per-interpreter (GH-20637)
7daba6f

vstinner · 2020-06-04T23:39:29Z

New changeset 3744ed2 by Victor Stinner in branch 'master':
bpo-40521: Make frame free list per-interpreter (GH-20638)
3744ed2

vstinner · 2020-06-05T00:05:49Z

New changeset 88ec919 by Victor Stinner in branch 'master':
bpo-40521: Make list free list per-interpreter (GH-20642)
88ec919

vstinner · 2020-06-05T00:34:22Z

New changeset 78a02c2 by Victor Stinner in branch 'master':
bpo-40521: Make async gen free lists per-interpreter (GH-20643)
78a02c2

vstinner · 2020-06-05T00:56:44Z

New changeset e005ead by Victor Stinner in branch 'master':
bpo-40521: Make context free list per-interpreter (GH-20644)
e005ead

vstinner · 2020-06-05T09:47:04Z

bpo-40521: Make list free list per-interpreter (GH-20642)
88ec919

This change contains an interesting fix:

_PyGC_Fini() clears gcstate->garbage list which can be stored in
the list free list. Call _PyGC_Fini() before _PyList_Fini() to
prevent leaking this list.

Maybe "Fini" functions should disable free lists to prevent following code to add something to a free list, during Python finalization.

vstinner · 2020-06-05T09:50:14Z

bench_dict.patch: Microbenchmark on the C function PyDict_New() to measure the overhead of PR 20645.

markshannon · 2020-06-05T09:57:22Z

I'm worried about the performance impact of these changes, especially as many of the changes haven't been reviewed.

Have you done any performance analysis or tests of the cumulative effect of all these changes?

vstinner · 2020-06-05T09:59:07Z

Have you done any performance analysis or tests of the cumulative effect of all these changes?

No. It would be interesting to measure that using pyperformance.

vstinner · 2020-12-26T22:09:06Z

New changeset ea25180 by Victor Stinner in branch 'master':
bpo-40521: Per-interpreter interned strings (GH-20085)

I reopen the issue. This change caused a regression in attached interned_bug.py. Output:
---

$ ./python interned_bug.py 
Exception ignored deletion of interned string failed:
KeyError: 'out of memory'
python: Objects/unicodeobject.c:1946: unicode_dealloc: Assertion `Py_REFCNT(unicode) == 1' failed.
Abandon (core dumped)

Running "import xml.parsers.expat" in a subinterpreter causes two issues when the subinterpreter completes:

pyexpat.errors and pyexpat.model dictionaries are cleared: all values set to None
unicode_dealloc() logs an error on an interned string in the subinterpreter, because the string doesn't exist in the subinterpreter interned dictionary.

The interned string is created in the main interpreter and so stored in the main interpreter interned dictionary.

The string is stored in 2 dictionaries of pyexpat.errors dictionaries:

>>> pyexpat.errors.messages[1]
'out of memory'
>>> pyexpat.errors.codes['out of memory']
1

When the subinterpreter clears pyexpat.errors and pyexpat.model dictionaries, the interned string is deleted: unicode_dealloc() is called. But unicode_dealloc() fails to delete the interned string in the subinterpreter interned dictionary.

pyexpat.errors and pyexpat.model modules are cleared because they are stored as different names in sys.modules by Lib/xml/parsers/expat.py:

sys.modules['xml.parsers.expat.model'] = model
sys.modules['xml.parsers.expat.errors'] = errors

vstinner · 2021-01-29T22:00:28Z

I reopen the issue. This change caused a regression in attached interned_bug.py.

Fixed by:

commit c8a87ad
Author: Mohamed Koubaa <koubaa.m@gmail.com>
Date: Mon Jan 4 08:34:26 2021 -0600

bpo-1635741: Port pyexpat to multi-phase init (PEP-489) (GH-22222)

JunyiXie · 2021-03-11T09:44:53Z

Should Make dtoa bigint free list per-interpreter.

static Bigint *bigint_freelist[Kmax+1]; -> _is { Bigint *bigint_freelist[Kmax+1]; }

JunyiXie · 2021-03-11T09:46:00Z

9d7681d

vstinner · 2021-03-13T13:25:31Z

New changeset 5bd1059 by junyixie in branch 'master':
bpo-40521: Make dtoa bigint free list per-interpreter (GH-24821)
5bd1059

mdickinson · 2021-03-21T11:59:22Z

Hi Victor,

I just noticed the change to dtoa.c in #69009. Please could you explain what the benefit of this change was?

In general, we need to be very conservative with changes to dtoa.c: it's a complex, fragile, performance-critical piece of code, and ideally we'd like it not to diverge from the upstream code any more than it already has, in case we need to integrate bugfixes from upstream.

It's feeling as though the normal Python development process is being bypassed here. As I understand it, this and similar changes are in aid of per-subinterpreter GILs. Has there been agreement from the core devs or steering council that this is a desirable goal? Should there be a PEP before more changes like this are made? (Or maybe there's already a PEP, that I missed? I know about PEP-554, but that PEP is explicit that GIL sharing is out of scope.)

vstinner · 2021-03-22T10:02:14Z

Mark Dickinson: "I just noticed the change to dtoa.c in #69009. Please could you explain what the benefit of this change was?"

The rationale is explained in bpo-40512. The goal is to run multiple Python interpreters in parallel in the same process.

dtoa.c had global variables shared by all interpreters without locking, so two intepreters could corrupt the freelist consistency.

Mark Dickinson: "In general, we need to be very conservative with changes to dtoa.c: it's a complex, fragile, performance-critical piece of code, and ideally we'd like it not to diverge from the upstream code any more than it already has, in case we need to integrate bugfixes from upstream."

I know that dtoa.c was copied from a third party project. But the commit 5bd1059 change change only makes sense in Python, I don't think that it would make sense to propose it upstream.

dtoa.c _Py_dg_strtod() is called by:

float.__round__()
_PyOS_ascii_strtod()

_PyOS_ascii_strtod() is called PyOS_string_to_double() which is called by:

float_from_string_inner()
complex_from_string_inner()
pickle load_float()
parser parsenumber_raw()
marshal r_float_str

dtoa.c _Py_dg_dtoa() is called by:

float.__round__()
PyOS_double_to_string()

PyOS_double_to_string() is called by:

float_repr()
complex_repr()
bytes % float: _PyBytes_FormatEx()
str % float: PyUnicode_Format()
_PyLong_FormatAdvancedWriter()
_PyComplex_FormatAdvancedWriter()
pickle save_float()
marshal w_float_str

I guess that the most important use case are float(str) and str(float). I wrote attached bench_dtoa.py to measure the effect on performance of the commit 5bd1059:
---

$ python3 -m pyperf compare_to before.json after.json 
float('0'): Mean +- std dev: [before] 80.5 ns +- 3.1 ns -> [after] 90.1 ns +- 3.6 ns: 1.12x slower
float('1.0'): Mean +- std dev: [before] 89.5 ns +- 4.3 ns -> [after] 97.2 ns +- 2.6 ns: 1.09x slower
float('340282366920938463463374607431768211455'): Mean +- std dev: [before] 480 ns +- 42 ns -> [after] 514 ns +- 13 ns: 1.07x slower
float('1044388881413152506691752710716624382579964249047383780384233483283953907971557456848826811934997558340890106714439262837987573438185793607263236087851365277945956976543709998340361590134383718314428070011855946226376318839397712745672334684344586617496807908705803704071284048740118609114467977783598029006686938976881787785946905630190260940599579453432823469303026696443059025015972399867714215541693835559885291486318237914434496734087811872639496475100189041349008417061675093668333850551032972088269550769983616369411933015213796825837188091833656751221318492846368125550225998300412344784862595674492194617023806505913245610825731835380087608622102834270197698202313169017678006675195485079921636419370285375124784014907159135459982790513399611551794271106831134090584272884279791554849782954323534517065223269061394905987693002122963395687782878948440616007412945674919823050571642377154816321380631045902916136926708342856440730447899971901781465763473223850267253059899795996090799469201774624817718449867455659250178329070473119433165550807568221846571746373296884912819520317457002440926616910874148385078411929804522981857338977648103126085903001302413467189726673216491511131602920781738033436090243804708340403154190335'): Mean +- std dev: [before] 717 ns +- 36 ns -> [after] 990 ns +- 27 ns: 1.38x slower
str(0.0): Mean +- std dev: [before] 113 ns +- 8 ns -> [after] 106 ns +- 4 ns: 1.06x faster
str(1.0): Mean +- std dev: [before] 141 ns +- 11 ns -> [after] 135 ns +- 17 ns: 1.05x faster
str(inf): Mean +- std dev: [before] 110 ns +- 11 ns -> [after] 98.9 ns +- 3.3 ns: 1.12x faster

Benchmark hidden because not significant (1): str(3.402823669209385e+38)

Geometric mean: 1.05x slower
---

I built Python with "./configure --enable-optimizations --with-lto" on Fedora 33 (GCC 10.2.1). I didn't use CPU isolation.

Oh, float(str) is between 1.09x slower and 1.38x slower.

On the other side, str(float) is between 1.06x and 1.12x faster, I'm not sure why. I guess that the problem is that PGO+LTO build is not reproducible, GCC might prefer to optimize some functions or others depending on the PROFILE_TASK (Makefile.pre.in, command used by GCC profiler).

Mark Dickinson: "It's feeling as though the normal Python development process is being bypassed here. As I understand it, this and similar changes are in aid of per-subinterpreter GILs. Has there been agreement from the core devs or steering council that this is a desirable goal? Should there be a PEP before more changes like this are made? (Or maybe there's already a PEP, that I missed? I know about PEP-554, but that PEP is explicit that GIL sharing is out of scope.)"

Honestly, I didn't expect any significant impact on performance on the change. So I merged the PR as I merge other fixes for subinterpreters. It seems like I underestimated the number of Balloc/Bmalloc calls per float(str) or str(float) call.

There is no PEP about running multiple Python interpreters in the same process. There is no consensus on this topic. I discussed it in private with some core devs, but that's not relevant here.

My plan is to merge changes which have no significant impact on performances, and wait for a PEP for changes which have a significant impact on performances. Most changes fix bugs in subinterpreters which still share a GIL. This use case is not new and is supported for 10-20 years.

For now, I will the commit 5bd1059.

vstinner · 2021-03-22T10:59:06Z

New changeset 39f6436 by Victor Stinner in branch 'master':
Revert "bpo-40521: Make dtoa bigint free list per-interpreter (GH-24821)" (GH-24964)
39f6436

rhettinger · 2021-03-25T20:32:33Z

New changeset 3bb1987 by Raymond Hettinger in branch 'master':
Revert "bpo-40521: Remove freelist from collections.deque() (GH-21073)" (GH-24944)
3bb1987

vstinner · 2021-03-25T20:51:20Z

I reopen the issue to remind me that collections.deque() freelist is shared by all interpreters.

vstinner · 2021-05-17T06:59:18Z

I reopen the issue to remind me that collections.deque() freelist is shared by all interpreters.

Each deque instance now has its own free list.

But dtoa.c still has a per-process cache, shared by all interpreters.

rhettinger · 2021-06-15T04:58:25Z

[Victor Stinner]

My plan is to merge changes which have no significant
impact on performances

FWIW, PyFloat_FromDouble() is the most performance critical function in floatobject.c.

vstinner · 2022-01-06T07:53:52Z

New changeset 35d6540 by Victor Stinner in branch 'main':
bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" (GH-30422)
35d6540

vstinner · 2022-01-06T15:12:36Z

New changeset 72c260c by Victor Stinner in branch '3.10':
[3.10] bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" (GH-30422) (GH-30425)
72c260c

vstinner · 2022-01-06T15:24:08Z

My commit ea25180 (interned strings) introduced bpo-46006 "[subinterpreter] _PyUnicode_EqualToASCIIId() issue with subinterpreters" regression.

To unblock the Python 3.11.0a4 release, I just reverted the change. It reintroduces the issue, so I created bpo-46283: "[subinterpreters] Unicode interned strings must not be shared between interpreters".

vstinner · 2022-11-03T17:55:33Z

I don't have the bandwidth to fix the remaining issues, so I just close again the issue.

vstinner added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.9 only security fixes labels May 5, 2020

vstinner added topic-subinterpreters and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels May 15, 2020

vstinner changed the title ~~Make tuple, dict, frame free lists, unicode interned strings, unicode latin1 singletons per-interpreter~~ [subinterpreters] Make free lists and unicode caches per-interpreter May 15, 2020

vstinner added topic-subinterpreters and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels May 15, 2020

vstinner changed the title ~~Make tuple, dict, frame free lists, unicode interned strings, unicode latin1 singletons per-interpreter~~ [subinterpreters] Make free lists and unicode caches per-interpreter May 15, 2020

vstinner added 3.10 only security fixes and removed 3.9 only security fixes labels Dec 26, 2020

vstinner closed this as completed Dec 26, 2020

vstinner added 3.10 only security fixes and removed 3.9 only security fixes labels Dec 26, 2020

vstinner closed this as completed Dec 26, 2020

vstinner reopened this Dec 26, 2020

vstinner closed this as completed Jan 29, 2021

vstinner reopened this Mar 25, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

vstinner closed this as completed Nov 3, 2022

[subinterpreters] Make free lists and unicode caches per-interpreter #84701

[subinterpreters] Make free lists and unicode caches per-interpreter #84701

Comments

vstinner commented May 5, 2020

vstinner commented May 5, 2020

vstinner commented May 5, 2020

vstinner commented May 5, 2020

vstinner commented May 6, 2020

vstinner commented May 6, 2020

vstinner commented May 13, 2020

vstinner commented May 13, 2020

vstinner commented May 19, 2020

vstinner commented Jun 2, 2020

vstinner commented Jun 4, 2020

vstinner commented Jun 4, 2020

vstinner commented Jun 4, 2020

vstinner commented Jun 4, 2020

vstinner commented Jun 5, 2020

vstinner commented Jun 5, 2020

vstinner commented Jun 5, 2020

vstinner commented Jun 5, 2020

vstinner commented Jun 5, 2020

markshannon commented Jun 5, 2020

vstinner commented Jun 5, 2020

vstinner commented Dec 26, 2020

vstinner commented Jan 29, 2021

JunyiXie mannequin commented Mar 11, 2021

JunyiXie mannequin commented Mar 11, 2021

vstinner commented Mar 13, 2021

mdickinson commented Mar 21, 2021

vstinner commented Mar 22, 2021

vstinner commented Mar 22, 2021

rhettinger commented Mar 25, 2021

vstinner commented Mar 25, 2021

vstinner commented May 17, 2021

rhettinger commented Jun 15, 2021

vstinner commented Jan 6, 2022

vstinner commented Jan 6, 2022

vstinner commented Jan 6, 2022

vstinner commented Nov 3, 2022