Message 413544 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tcl326
Recipients	davin, pitrou, tcl326
Date	2022-02-19.12:02:30
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1645272150.7.0.530571625794.issue46799@roundup.psfhosted.org>
In-reply-to

Content
The current implementation of ShareableList keeps an unnecessary list of offsets in self._allocated_offsets. This list could have a large memory footprint if the number of items in the list is high. Additionally, this list will be copied in each process that needs access to the ShareableList, sometimes negating the benefit of the shared memory. Furthermore, in the current implementation, different metadata is kept at different sections of shared memory, requiring multiple struck.unpack_from calls for a __getitem__ call. I have attached a prototype that merged the allocated offsets and packing format into a single section in the shared memory. This allows us to use single struck.unpack_from operation to obtain both the allocated offset and the packing format. By removing the self._allocated_offset list and reducing the number of struck.unpack_from operations, we can drastically reduce the memory usage and increase the reading performance by 10%. In the case where there are only integers in the ShareableList, we can reduce the memory usage by half. The attached implementation also fixed the issue https://bugs.python.org/issue44170 that causes error when reading some Unicode characters. I am happy to adapt this implementation into a proper bugfix/patch if it is deemed reasonable.

The current implementation of ShareableList keeps an unnecessary list of offsets in self._allocated_offsets. This list could have a large memory footprint if the number of items in the list is high. Additionally, this list will be copied in each process that needs access to the ShareableList, sometimes negating the benefit of the shared memory. Furthermore, in the current implementation, different metadata is kept at different sections of shared memory, requiring multiple struck.unpack_from calls for a __getitem__ call. I have attached a prototype that merged the allocated offsets and packing format into a single section in the shared memory. This allows us to use single struck.unpack_from operation to obtain both the allocated offset and the packing format. By removing the self._allocated_offset list and reducing the number of struck.unpack_from operations, we can drastically reduce the memory usage and increase the reading performance by 10%. In the case where there are only integers in the ShareableList, we can reduce the memory usage by half. The attached implementation also fixed the issue https://bugs.python.org/issue44170 that causes error when reading some Unicode characters. I am happy to adapt this implementation into a proper bugfix/patch if it is deemed reasonable.

History
Date	User	Action	Args
2022-02-19 12:02:30	tcl326	set	recipients: + tcl326, pitrou, davin
2022-02-19 12:02:30	tcl326	set	messageid: <1645272150.7.0.530571625794.issue46799@roundup.psfhosted.org>
2022-02-19 12:02:30	tcl326	link	issue46799 messages
2022-02-19 12:02:30	tcl326	create