classification
Title: 'u' formatted arrays mostly prevent appends of 4 byte characters
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.8, Python 3.7, Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: bup
Priority: normal Keywords:

Created on 2019-10-24 10:31 by bup, last changed 2019-10-24 10:31 by bup.

Messages (1)
msg355319 - (view) Author: Dan Snider (bup) * Date: 2019-10-24 10:31
Unicode characters with code points above u+ffff can only be added to the end of an array, and only from a call to the "fromunicode" method. This is because "fromunicode" uses a different procedure to modify the array compared to __new__, __setitem__, append, and extend array methods, all of which eventually call u_setitem routine, which calls PyArg_Parse with a format spec of "u#". The error occurs in that call, from what at first glance appears to be an incorrect length determination for unicode objects of the 4 byte kind.
History
Date User Action Args
2019-10-24 10:31:24bupcreate