Message350406
Hmm, I'm a bit confused because:
* Your patch at GH-15251 replaces a number of calls to PyLong_FromLong with calls to the new _PyLong_FromUnsignedChar.
* That function, in turn, just calls PyLong_FromSize_t.
* And that function begins:
PyObject *
PyLong_FromSize_t(size_t ival)
{
PyLongObject *v;
size_t t;
int ndigits = 0;
if (ival < PyLong_BASE)
return PyLong_FromLong((long)ival);
// ...
* So, it seems like after your patch we still end up calling PyLong_FromLong at each of these callsites, just after a couple more indirections than before.
Given the magic of compilers and of hardware branch prediction, it wouldn't at all surprise me for those indirections to not make anything slower... but if the measurements are coming out *faster*, then I feel like something else must be going on. ;-)
Ohhh, I see -- I bet it's that at _PyLong_FromUnsignedChar, the compiler can see that `is_small_int(ival)` is always true, so the whole function just turns into get_small_int. Whereas when compiling a call to PyLong_FromLong from some other file (other translation unit), it can't see that and can't make the optimization.
Two questions, then:
* How do the measurements look under LTO? I wonder if with LTO the linker is able to make the same optimization that this change helps the compiler make.
* Is there a particular reason to specifically call PyLong_FromSize_t? Seems like PyLong_FromLong is the natural default (and what we default to in the rest of the code), and it's what this ends up calling anyway. |
|
Date |
User |
Action |
Args |
2019-08-24 23:03:05 | Greg Price | set | recipients:
+ Greg Price, jdemeyer, sir-sigurd |
2019-08-24 23:03:05 | Greg Price | set | messageid: <1566687785.63.0.996496441368.issue37837@roundup.psfhosted.org> |
2019-08-24 23:03:05 | Greg Price | link | issue37837 messages |
2019-08-24 23:03:05 | Greg Price | create | |
|