Issue 38579: 'u' formatted arrays mostly prevent appends of 4 byte characters

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/82760

classification

Title:	'u' formatted arrays mostly prevent appends of 4 byte characters
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.8, Python 3.7, Python 3.6, Python 3.5

process

Created on 2019-10-24 10:31 by bup, last changed 2022-04-11 14:59 by admin.

Messages (2)
msg355319 - (view)	Author: Dan Snider (bup) *	Date: 2019-10-24 10:31
Unicode characters with code points above u+ffff can only be added to the end of an array, and only from a call to the "fromunicode" method. This is because "fromunicode" uses a different procedure to modify the array compared to __new__, __setitem__, append, and extend array methods, all of which eventually call u_setitem routine, which calls PyArg_Parse with a format spec of "u#". The error occurs in that call, from what at first glance appears to be an incorrect length determination for unicode objects of the 4 byte kind.
msg407932 - (view)	Author: Irit Katriel (iritkatriel) *	Date: 2021-12-07 13:01
Can you include a code snippet to demonstrate the problem?

History
Date	User	Action	Args
2022-04-11 14:59:22	admin	set	github: 82760
2021-12-07 13:01:07	iritkatriel	set	nosy: + iritkatriel messages: + msg407932
2019-10-24 10:31:24	bup	create