New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List object memory allocator #70570
Comments
Hi All, This is Catalin from the Server Scripting Languages Optimization Team at Intel Corporation. I would like to submit a patch that replaces the 'malloc' allocator used by the list object (Objects/listobject.c) with the small object allocator (obmalloc.c) and simplifies the 'list_resize' function by removing a redundant check and properly handling resizing to zero. Replacing PyMem_* calls with PyObject_* inside the list implementation is beneficial because many PyMem_* calls are made for requesting sizes that are better handled by the small object allocator. For example, when running Tools/pybench.py -w 1 a total of 48.295.840 allocation requests are made by the list implementation (either by using 'PyMem_MALLOC' directly or by calling 'PyMem_RESIZE') out of which 42.581.993 (88%) are requesting sizes that can be handled by the small object allocator (they're equal or less than 512 bytes in size). The changes to 'list_resize' were made in order to further improve performance by removing a redundant check and handling the 'resize to zero' case separately. The 'empty' state of a list is suggested by the 'PyList_New' function as having the 'ob_item' pointer NULL and the 'ob_size' and 'allocated' members equal with 0. Previously, when being called with zero as a size parameter, 'list_resize' would set 'ob_size' and 'allocated' to zero, but it would also call 'PyMem_RESIZE' which, by its design, would call 'realloc' with a size of 1, thus going through the process of allocating an unnecessary 1 byte and setting the 'ob_item' pointer with the newly obtained address. The proposed implementation just deletes the buffer pointed by 'ob_item' and sets 'ob_size', 'allocated' and 'ob_item' to zero when receiving a 'resize to zero' request. Hardware and OS Configuration BIOS settings: Intel Turbo Boost Technology: false OS: Ubuntu 14.04.2 LTS OS configuration: Address Space Layout Randomization (ASLR) disabled to reduce run GCC version: GCC version 5.1.0 Benchmark: Grand Unified Python Benchmark from Measurements and Results
B. Results: Table 1. CPython 3 results on Intel XEON (Haswell-EP) @ 2.3 GHz Benchmark %D Table 2. CPython 3 results on Intel XEON (Broadwell-EP) @ 2.3 GHz Benchmark %D Table 3. CPython 2 results on Intel XEON (Haswell-EP) @ 2.3 GHz Benchmark %D Table 4. CPython 2 results on Intel XEON (Broadwell-EP) @ 2.3 GHz Benchmark %D |
Instead of modifying individual files, I proposed to modify PyMem_Malloc to use PyObject_Malloc allocator: issue bpo-26249. But the patch for Python 2 still makes sense. |
Hi Victor, This patch follows the same idea as your proposal, but it's focused on a single object type. I think doing this incrementally is the safer approach, allowing us to have finer control over the new |
Catalin Gabriel Manciu: "(...) allowing us to have finer control over Ah, interesting, do you think that it's possible that my change can |
Theoretically, an object type that consistently allocates more than the small object threshold would perform a bit slower because I will post some benchmark results on your issue page as soon as I get them. |
"Theoretically, an object type that consistently allocates more than the small object threshold would perform a bit slower because it would first jump to the small object allocator, do the size comparison and then jump to malloc." I expect that the cost of the extra check is *very* cheap (completly negligible) compared to the cost of a call to malloc(). To have an idea of the cost of the Python code around system allocators, you can take a look at the Performance section of my PEP-445 which added an indirection to all Python allocators: I was unable to measure an overhead on macro benchmarks (perf.py). The overhead on microbenchmarks was really hard to measure because it was so low that benchmarks were very unable. |
My impression is that we do not do such performance enhancements to 2.7 for the same reason we would not do them to current 3.x -- the risk of breakage. Have I misunderstood? |
Terry J. Reedy added the comment:
Breakage of what? The change looks very safe. |
Our Haswell-EP OpenStack Swift setup shows a 1% improvement in throughput rate using CPython 2.7 (5715a6d9ff12) with this patch. |
Update patch for Python 2.7 |
Il don't understand your change: in Python 3.6, PyMem now uses exactly the |
I know PyMem and PyObject allocator is same by default. But it's configurable. |
Maybe, PyObject_MALLOC remains only for backward compatibility? |
I know PyMem and PyObject allocator is same by default. But it's The two functions always use the same allocator: Sorry but which issue are you trying to fix here? Can you please elaborate As I wrote before, only Python 2 should be modified now (if you consider |
OK. I didn't know PyMem and PyObject allocators are always same. Off topic: I want to know which of PyMem and PyObject allocator is preferred |
FYI the Python 3.6 change in PyMem_Malloc() required to implement a new complex check on the GIL. Search for "PyMem_Malloc() now fails if the GIL is not held" in my following blog post: Requiring that the GIL is held is a backward incompatible change. I suggest to run your code with PYTHONMALLOC=debug on Python 3.6 ;-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: