This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: String with NUL characters truncated by ctypes when assigning to a char array
Type: behavior Stage:
Components: ctypes Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Rafal.Dowgird, vinay.sajip, vstinner
Priority: normal Keywords:

Created on 2011-08-17 13:06 by Rafal.Dowgird, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
reproduce.py Rafal.Dowgird, 2011-08-17 13:06 Script to reproduce
output.txt Rafal.Dowgird, 2011-08-17 13:07
Messages (6)
msg142274 - (view) Author: Rafał Dowgird (Rafal.Dowgird) Date: 2011-08-17 13:06
The ctypes module seems to truncate NUL-containing strings when assigning to structure fields of type c_char*1024. Reproduced on a 2.7.2 compiled from tarball. Script to reproduce attached.
msg142275 - (view) Author: Rafał Dowgird (Rafal.Dowgird) Date: 2011-08-17 13:07
Attaching output of the script. 'x\000y\000' becomes 'x' after assigning to a char array.
msg142276 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-08-17 13:16
I don't think that it's a bug, but a feature.

Example:

buffer=ctypes.create_string_buffer(4)
buffer.value='a\0bc'
print("buffer.value=%r" % buffer.value)
print("buffer.raw=%r" % buffer.raw)

displays

buffer.value='a'
buffer.raw='a\x00bc'

Sorry, I don't know how to get the raw value of a c_char array in a structure. You should maybe use another type.
msg142277 - (view) Author: Rafał Dowgird (Rafal.Dowgird) Date: 2011-08-17 13:19
The buffer output of the script suggests that the part after the '\000' has not been copied into the array at all. If that's the case, then the 'raw' output wouldn't print it anyway.
msg143169 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-08-29 18:05
This behaviour also occurs in 3.3, where this does appear to be a bug. In Modules/_ctypes/cfield.c, the setting code does a strlen(), which is in fact questioned in a comment. In function s_set():

size = strlen(data); /* XXX Why not Py_SIZE(value)? */

Why not, indeed? value is the bytes object passed in, and using Py_SIZE does indeed copy all the bytes. However, it's operating in string rather than buffer mode: for example, it adds a byte for a terminating NUL, so if the 5-byte value b'x\x00y\x00z' were passed, 6 bytes are actually copied. This doesn't seem right.

Even after changing s_set to use Py_SIZE, you can't see the copied bytes when you access the attribute, since the code in s_get() skips out at the first NUL byte and then constructs using PyBytes_FromStringAndSize and the truncated size. One can see the convenience of avoiding the display of lots of NUL chars, but it doesn't seem correct to do this.

On 2.x it's a bit muddier, as arrays of c_char could be using ASCII strings, where a NUL terminator might be appropriate to consider.
msg143170 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-08-29 18:17
Seems related: #8161
History
Date User Action Args
2022-04-11 14:57:20adminsetgithub: 56978
2019-12-06 07:16:34vinay.sajipsetversions: + Python 3.7, Python 3.8, Python 3.9, - Python 2.7, Python 3.3
2011-08-29 18:17:32vinay.sajipsetmessages: + msg143170
2011-08-29 18:05:11vinay.sajipsetnosy: + vinay.sajip

messages: + msg143169
versions: + Python 3.3
2011-08-17 13:19:07Rafal.Dowgirdsetmessages: + msg142277
2011-08-17 13:16:23vstinnersetnosy: + vstinner
messages: + msg142276
2011-08-17 13:07:49Rafal.Dowgirdsetfiles: + output.txt

messages: + msg142275
2011-08-17 13:06:11Rafal.Dowgirdcreate