This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: ShareableList cannot safely handle multibyte utf-8 characters
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.8
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: andrei.avk, huwcbjones, python-dev
Priority: normal Keywords: patch

Created on 2021-05-18 11:12 by huwcbjones, last changed 2022-04-11 14:59 by admin.

File name Uploaded Description Edit huwcbjones, 2021-05-18 11:12 Minimum working example to trigger observed behaviour
Pull Requests
URL Status Linked Edit
PR 26328 open python-dev, 2021-05-24 11:07
Messages (4)
msg393868 - (view) Author: Huw Jones (huwcbjones) Date: 2021-05-18 11:12
I've experienced a UnicodeDecodeError when adding unicode strings that contain multibye utf-8 characters into a shareable list.
My observation is that ShareableList chunks the list of strings before sending it over the process boundary, however this chunking process is not multibyte aware and will chunk in the middle of multibyte characters.
On the other end, this results in the ShareableList throwing a UnicodeDecodeError when it fails to decode not-a-full multibyte utf-8 character.

From running the attached MWE, I see that the string is sent in two chunks, the first being b'Boom \xf0\x9f\x92\xa5 \xf0\x9f\x92\xa5 \xf0' which clearly splits the 4 bytes of the 💥 character into the first byte and remaining 3 bytes.
msg393869 - (view) Author: Huw Jones (huwcbjones) Date: 2021-05-18 11:18
The workaround I am using is to manually encode/decode.

For the MWE, this means encoding on the creation side
shared_list = smm.ShareableList([s.encode() for s in strings])
and decoding before using the string
for enc_str in shared_list:
   string = enc_str.decode()
msg408074 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-12-09 06:19
We classify 'crash' type as seg faults etc, so changing this to 'behavior' type.
msg408075 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-12-09 06:21
I've confirmed this issue is still present in 3.11.
Date User Action Args
2022-04-11 14:59:45adminsetgithub: 88336
2021-12-09 06:21:18andrei.avksetmessages: + msg408075
2021-12-09 06:19:17andrei.avksettype: crash -> behavior

messages: + msg408074
nosy: + andrei.avk
2021-05-24 11:07:36python-devsetkeywords: + patch
nosy: + python-dev

pull_requests: + pull_request24920
stage: patch review
2021-05-18 11:18:26huwcbjonessetmessages: + msg393869
2021-05-18 11:12:09huwcbjonescreate