This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author neonene
Recipients Mark.Shannon, brandtbucher, erlendaasland, gvanrossum, kj, lemburg, malin, neonene, pablogsal, paul.moore, rhettinger, steve.dower, tim.golden, vstinner, zach.ware
Date 2022-04-07.23:41:04
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1649374864.49.0.499792007543.issue45116@roundup.psfhosted.org>
In-reply-to
Content
>What exactly does "pgo hard reject" mean?

In my recognition, "pgo hard reject" is based on the PGOptimizer's heuristic, "reject" is related to the probe count (hot/cold).

  https://developercommunity.visualstudio.com/t/1531987#T-N1535774


And there was a reply from MSVC team, closing the issue. MSVC won't be fixed in the near future.

  https://developercommunity.visualstudio.com/t/1595341#T-N1695626

From the reply and my investigation, 3.11 would need the following:

1. Some callsites such as tp_* pointer should not inline its fastpaths in the eval switch-case. They often conflict. Each pointer needs to be wrapped with a function or maybe _PyEval_EvalFrameDefault needs to be enclosed with "inline_depth(0)" pragma.

2. __assume(0) should be replaced with other function, inside the eval switch-case or in the inlined paths of callees. This is critical with PGO.

3. For inlining, use __forceinline / macro / const function pointer.

   MSVC's stuck can be avoided in many ways, when force-inlining in the evalloop a ton of Py_DECREF()s, unless tp_dealloc does not create a inlined callsite:

     void
     _Py_Dealloc(PyObject *op)
     {
      ...
     #pragma inline_depth(0) // effects from here, PGO accepts only 0.
         (*dealloc)(op);     // conflicts when inlined.
     }
     #pragma inline_depth()  // can be reset only outside the func.



* Virtual Call Speculation:
  https://docs.microsoft.com/en-us/cpp/build/profile-guided-optimizations?view=msvc-170#optimizations-performed-by-pgo


* The profiler runs under /GENPROFILE:PATH option, but at the big ceval-func, the optimizer merges the profiles into one like /GENPROFILE:NOPATH mode.
https://docs.microsoft.com/en-us/cpp/build/reference/genprofile-fastgenprofile-generate-profiling-instrumented-build?view=msvc-170#arguments


* __assume(0) (Py_UNREACHABLE):
  https://devblogs.microsoft.com/cppblog/visual-studio-2017-throughput-improvements-and-advice/#remove-usages-of-__assume
History
Date User Action Args
2022-04-07 23:41:04neonenesetrecipients: + neonene, lemburg, gvanrossum, rhettinger, paul.moore, vstinner, tim.golden, Mark.Shannon, zach.ware, steve.dower, malin, pablogsal, brandtbucher, erlendaasland, kj
2022-04-07 23:41:04neonenesetmessageid: <1649374864.49.0.499792007543.issue45116@roundup.psfhosted.org>
2022-04-07 23:41:04neonenelinkissue45116 messages
2022-04-07 23:41:04neonenecreate