Message347615
When PGO is not used, compilers don't know which part is hot.
So gcc failed to inline hot code in pymalloc_alloc and pymalloc_free
into _PyObject_Malloc and _PyObject_Free. For example, only this code is inlined into _PyObject_Malloc.
if (nbytes == 0) {
return 0;
}
if (nbytes > SMALL_REQUEST_THRESHOLD) {
return 0;
}
But the hottest part is taking memory block from freelist in the pool.
To optimize it,
* make pymalloc_alloc and pymalloc_free inline functions
* Split code for rare / slow paths out to new functions
In PR 14674, pymalloc is now as fast as mimalloc in spectral_norm benchmark.
$ ./python bm_spectral_norm.py --compare-to=./python-master
python-master: ..................... 199 ms +- 1 ms
python: ..................... 176 ms +- 1 ms
Mean +- std dev: [python-master] 199 ms +- 1 ms -> [python] 176 ms +- 1 ms: 1.13x faster (-11%) |
|
Date |
User |
Action |
Args |
2019-07-10 10:59:54 | methane | set | recipients:
+ methane |
2019-07-10 10:59:54 | methane | set | messageid: <1562756394.12.0.431999649056.issue37543@roundup.psfhosted.org> |
2019-07-10 10:59:54 | methane | link | issue37543 messages |
2019-07-10 10:59:53 | methane | create | |
|