Author neonene
Date 2022-04-07.23:41:04
>What exactly does "pgo hard reject" mean?

In my recognition, "pgo hard reject" is based on the PGOptimizer's heuristic, "reject" is related to the probe count (hot/cold).

And there was a reply from MSVC team, closing the issue. MSVC won't be fixed in the near future.

From the reply and my investigation, 3.11 would need the following:

1. Some callsites such as tp_* pointer should not inline its fastpaths in the eval switch-case. They often conflict. Each pointer needs to be wrapped with a function or maybe _PyEval_EvalFrameDefault needs to be enclosed with "inline_depth(0)" pragma.

2. __assume(0) should be replaced with other function, inside the eval switch-case or in the inlined paths of callees. This is critical with PGO.

3. For inlining, use __forceinline / macro / const function pointer.

   MSVC's stuck can be avoided in many ways, when force-inlining in the evalloop a ton of Py_DECREF()s, unless tp_dealloc does not create a inlined callsite:

     _Py_Dealloc(PyObject *op)
     #pragma inline_depth(0) // effects from here, PGO accepts only 0.
         (*dealloc)(op);     // conflicts when inlined.
     #pragma inline_depth()  // can be reset only outside the func.

* Virtual Call Speculation:

* The profiler runs under /GENPROFILE:PATH option, but at the big ceval-func, the optimizer merges the profiles into one like /GENPROFILE:NOPATH mode.

* __assume(0) (Py_UNREACHABLE):
