This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author neonene
Recipients Mark.Shannon, neonene, paul.moore, rhettinger, steve.dower, tim.golden, vstinner, zach.ware
Date 2021-09-07.18:13:04
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1631038384.95.0.122782466984.issue45116@roundup.psfhosted.org>
In-reply-to
Content
@vstinner: __forceinline suggestion

Since PR25244 (mentioned above), it seems link.exe has got to get stuck on python310.dll.
Before the PR, it took 10x~ longer to link than without __forceinline function.
I can confirm with _Py_DECREF() and _Py_XDECREF() and one training-job (the more fucntions forced/jobs used, the slower to link).
Have you tried __forceinline on PGO ?


> I don't understand how to read the table.

Overhead field is the output of pyperf command, not subtraction (the answers are the same just luckily).

ex) 3.10rc1x86 PGO: 
     PGO      : pyperf compare_to 3.10a7 left
     patched  : pyperf compare_to 3.10a7 right
     overhead : pyperf compare_to right  left 
  are
     1.15x slower (slower 52, faster  4, not significant  2)
     1.13x slower (slower 50, faster  4, not significant  4)
     1.02x slower (slower 29, faster 14, not significant 15)


> I'm not sure if PGO builds are reproducible,

MSVC does not produce the same code. Inlining (all or nothing) might be a quite special case in the hottest section.
I suspect the profiler doesn't work well only for _PyEval_EvalFrameDefault(), including branch/align optimization.
So my posted macro or inlining is just for a mesureing, not the solution.
History
Date User Action Args
2021-09-07 18:13:04neonenesetrecipients: + neonene, rhettinger, paul.moore, vstinner, tim.golden, Mark.Shannon, zach.ware, steve.dower
2021-09-07 18:13:04neonenesetmessageid: <1631038384.95.0.122782466984.issue45116@roundup.psfhosted.org>
2021-09-07 18:13:04neonenelinkissue45116 messages
2021-09-07 18:13:04neonenecreate