This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients JunyiXie, Mark.Shannon, corona10, mark.dickinson, rhettinger, shihai1991, vstinner
Date 2021-03-22.10:02:14
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
Mark Dickinson: "I just noticed the change to dtoa.c in GH-24821. Please could you explain what the benefit of this change was?"

The rationale is explained in bpo-40512. The goal is to run multiple Python interpreters in parallel in the same process.

dtoa.c had global variables shared by all interpreters without locking, so two intepreters could corrupt the freelist consistency.

Mark Dickinson: "In general, we need to be very conservative with changes to dtoa.c: it's a complex, fragile, performance-critical piece of code, and ideally we'd like it not to diverge from the upstream code any more than it already has, in case we need to integrate bugfixes from upstream."

I know that dtoa.c was copied from a third party project. But the commit 5bd1059184b154d339f1bd53d23c98b5bcf14c8c change change only makes sense in Python, I don't think that it would make sense to propose it upstream.

dtoa.c _Py_dg_strtod() is called by:

* float.__round__()
* _PyOS_ascii_strtod()

_PyOS_ascii_strtod() is called PyOS_string_to_double() which is called by:

* float_from_string_inner()
* complex_from_string_inner()
* pickle load_float()
* parser parsenumber_raw()
* marshal r_float_str

dtoa.c _Py_dg_dtoa() is called by:

* float.__round__()
* PyOS_double_to_string()

PyOS_double_to_string() is called by:

* float_repr()
* complex_repr()
* bytes % float: _PyBytes_FormatEx()
* str % float: PyUnicode_Format()
* _PyLong_FormatAdvancedWriter()
* _PyComplex_FormatAdvancedWriter()
* pickle save_float()
* marshal w_float_str

I guess that the most important use case are float(str) and str(float). I wrote attached to measure the effect on performance of the commit 5bd1059184b154d339f1bd53d23c98b5bcf14c8c:
$ python3 -m pyperf compare_to before.json after.json 
float('0'): Mean +- std dev: [before] 80.5 ns +- 3.1 ns -> [after] 90.1 ns +- 3.6 ns: 1.12x slower
float('1.0'): Mean +- std dev: [before] 89.5 ns +- 4.3 ns -> [after] 97.2 ns +- 2.6 ns: 1.09x slower
float('340282366920938463463374607431768211455'): Mean +- std dev: [before] 480 ns +- 42 ns -> [after] 514 ns +- 13 ns: 1.07x slower
float('1044388881413152506691752710716624382579964249047383780384233483283953907971557456848826811934997558340890106714439262837987573438185793607263236087851365277945956976543709998340361590134383718314428070011855946226376318839397712745672334684344586617496807908705803704071284048740118609114467977783598029006686938976881787785946905630190260940599579453432823469303026696443059025015972399867714215541693835559885291486318237914434496734087811872639496475100189041349008417061675093668333850551032972088269550769983616369411933015213796825837188091833656751221318492846368125550225998300412344784862595674492194617023806505913245610825731835380087608622102834270197698202313169017678006675195485079921636419370285375124784014907159135459982790513399611551794271106831134090584272884279791554849782954323534517065223269061394905987693002122963395687782878948440616007412945674919823050571642377154816321380631045902916136926708342856440730447899971901781465763473223850267253059899795996090799469201774624817718449867455659250178329070473119433165550807568221846571746373296884912819520317457002440926616910874148385078411929804522981857338977648103126085903001302413467189726673216491511131602920781738033436090243804708340403154190335'): Mean +- std dev: [before] 717 ns +- 36 ns -> [after] 990 ns +- 27 ns: 1.38x slower
str(0.0): Mean +- std dev: [before] 113 ns +- 8 ns -> [after] 106 ns +- 4 ns: 1.06x faster
str(1.0): Mean +- std dev: [before] 141 ns +- 11 ns -> [after] 135 ns +- 17 ns: 1.05x faster
str(inf): Mean +- std dev: [before] 110 ns +- 11 ns -> [after] 98.9 ns +- 3.3 ns: 1.12x faster

Benchmark hidden because not significant (1): str(3.402823669209385e+38)

Geometric mean: 1.05x slower

I built Python with "./configure --enable-optimizations --with-lto" on Fedora 33 (GCC 10.2.1). I didn't use CPU isolation.

Oh, float(str) is between 1.09x slower and 1.38x slower.

On the other side, str(float) is between 1.06x and 1.12x faster, I'm not sure why. I guess that the problem is that PGO+LTO build is not reproducible, GCC might prefer to optimize some functions or others depending on the PROFILE_TASK (, command used by GCC profiler).

Mark Dickinson: "It's feeling as though the normal Python development process is being bypassed here. As I understand it, this and similar changes are in aid of per-subinterpreter GILs. Has there been agreement from the core devs or steering council that this is a desirable goal? Should there be a PEP before more changes like this are made? (Or maybe there's already a PEP, that I missed? I know about PEP 554, but that PEP is explicit that GIL sharing is out of scope.)"

Honestly, I didn't expect any significant impact on performance on the change. So I merged the PR as I merge other fixes for subinterpreters. It seems like I underestimated the number of Balloc/Bmalloc calls per float(str) or str(float) call.

There is no PEP about running multiple Python interpreters in the same process. There is no consensus on this topic. I discussed it in private with some core devs, but that's not relevant here.

My plan is to merge changes which have no significant impact on performances, and wait for a PEP for changes which have a significant impact on performances. Most changes fix bugs in subinterpreters which still share a GIL. This use case is not new and is supported for 10-20 years.

For now, I will the commit 5bd1059184b154d339f1bd53d23c98b5bcf14c8c.
Date User Action Args
2021-03-22 10:02:15vstinnersetrecipients: + vstinner, rhettinger, mark.dickinson, Mark.Shannon, corona10, shihai1991, JunyiXie
2021-03-22 10:02:15vstinnersetmessageid: <>
2021-03-22 10:02:15vstinnerlinkissue40521 messages
2021-03-22 10:02:14vstinnercreate