Title: 'u' formatted arrays mostly prevent appends of 4 byte characters
Created on 2019-10-24 10:31 by bup, last changed 2019-10-24 10:31 by bup.

msg355319 - (view) Author: Dan Snider (bup) * Date: 2019-10-24 10:31
Unicode characters with code points above u+ffff can only be added to the end of an array, and only from a call to the "fromunicode" method. This is because "fromunicode" uses a different procedure to modify the array compared to __new__, __setitem__, append, and extend array methods, all of which eventually call u_setitem routine, which calls PyArg_Parse with a format spec of "u#". The error occurs in that call, from what at first glance appears to be an incorrect length determination for unicode objects of the 4 byte kind.
