This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: ShareableList memory bloat and performance improvement
Type: performance Stage: patch review
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution: remind
Dependencies: Superseder:
Assigned To: Nosy List: davin, pitrou, tcl326
Priority: normal Keywords: patch

Created on 2022-02-19 12:02 by tcl326, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
shareable_list.py tcl326, 2022-02-19 12:02 Custom Implementation of ShareableList
Pull Requests
URL Status Linked Edit
PR 31467 open tcl326, 2022-02-21 14:06
Messages (3)
msg413544 - (view) Author: Ting-Che Lin (tcl326) * Date: 2022-02-19 12:02
The current implementation of ShareableList keeps an unnecessary list of offsets in self._allocated_offsets. This list could have a large memory footprint if the number of items in the list is high. Additionally, this list will be copied in each process that needs access to the ShareableList, sometimes negating the benefit of the shared memory. Furthermore, in the current implementation, different metadata is kept at different sections of shared memory, requiring multiple struck.unpack_from calls for a __getitem__ call. I have attached a prototype that merged the allocated offsets and packing format into a single section in the shared memory. This allows us to use single struck.unpack_from operation to obtain both the allocated offset and the packing format. By removing the self._allocated_offset list and reducing the number of struck.unpack_from operations, we can drastically reduce the memory usage and increase the reading performance by 10%. In the case where there are only integers in the ShareableList, we can reduce the memory usage by half. The attached implementation also fixed the issue https://bugs.python.org/issue44170 that causes error when reading some Unicode characters. I am happy to adapt this implementation into a proper bugfix/patch if it is deemed reasonable.
msg414145 - (view) Author: Ting-Che Lin (tcl326) * Date: 2022-02-27 08:44
So I wrote a patch for this issue and published submitted a MR. When I was working on the patch, I realized that there is another issue related to how string and byte array size alignment is calculated. As seen here: https://github.com/python/cpython/blob/3.10/Lib/multiprocessing/shared_memory.py#L303. 

>>> from multiprocessing.shared_memory import ShareableList
>>> s_list = ShareableList(["12345678"])
>>> s_list.format
'16s'

I changed the calculation of 
self._alignment * (len(item) // self._alignment + 1),
to
self._alignment * max(1, (len(item) - 1) // self._alignment + 1)

With the patch, this will give
>>> from multiprocessing.shared_memory import ShareableList
>>> s_list = ShareableList(["12345678"])
>>> s_list.format
'8s'
msg416124 - (view) Author: Ting-Che Lin (tcl326) * Date: 2022-03-27 13:33
A gentle Ping to the multiprocessing lib maintainers. Is there anything else I can do to move this forward?
History
Date User Action Args
2022-04-11 14:59:56adminsetgithub: 90955
2022-03-27 13:33:18tcl326setresolution: remind
messages: + msg416124
2022-02-27 08:44:39tcl326setmessages: + msg414145
2022-02-21 14:06:13tcl326setkeywords: + patch
stage: patch review
pull_requests: + pull_request29597
2022-02-19 12:02:30tcl326create