Title: ctypes string pointer fields should accept embedded null characters
Type: behavior Stage: patch review
Components: ctypes Versions: Python 3.10, Python 3.9, Python 3.8
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ZackerySpytz, amaury.forgeotdarc, belopolsky, eryksun, meador.inge, ned.deily, serhiy.storchaka, theller
Priority: critical Keywords: 3.6regression, patch

Created on 2018-02-01 19:54 by theller, last changed 2021-03-19 04:37 by eryksun.

File name Uploaded Description Edit theller, 2018-02-01 19:54
Pull Requests
URL Status Linked Edit
PR 8721 open ZackerySpytz, 2018-08-10 05:50
Messages (4)
msg311462 - (view) Author: Thomas Heller (theller) * (Python committer) Date: 2018-02-01 19:54
ctypes Structure fields of type c_char_p or c_wchar_p used to accept strings with embedded null characters.  I noticed that Python 3.6.4 does refuse them.  It seems this has been changed in recent version(s).

There ARE use-cases for this:  The Windows-API OPENFILENAME structure is one example.  The Microsoft docs for the lpstrFilter field:


    Type: LPCTSTR

    A buffer containing pairs of null-terminated filter strings. The last string in the buffer must be terminated by two NULL characters.

I have attached a simple script which demonstrates this new behaviour; the output with Python 3.6.4 is this:

Traceback (most recent call last):
  File "", line 8, in <module>
    t.unicode = u"foo\0bar"
ValueError: embedded null character
msg311468 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-02-01 20:51
PyUnicode_AsWideCharString was updated to raise ValueError for embedded nulls if the `size` output parameter is NULL. Z_set in cfield.c should be updated to get the size, which can be ignored here. For example:

    Py_ssize_t size; 
    buffer = PyUnicode_AsWideCharString(value, &size);
msg314567 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-03-28 07:14
The change mentioned was made in GH-2462 for Issue13617 and was released in 3.6.3 (and 3.5.4 now in security-fix-only mode).
msg314579 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-28 09:07
This is a regression. Eryk's solution LGTM. Do you mind to create a PR?

But u"foo\0bar" is not terminated by two NULL characters. If this is used in real code, it contains a bug. And the getter of this field will return the string only to the first null character. More work is needed for making this more reliable.
Date User Action Args
2021-03-19 04:37:09eryksunsetversions: + Python 3.9, Python 3.10, - Python 3.6, Python 3.7
2018-08-10 05:52:17ZackerySpytzsetnosy: + ZackerySpytz
2018-08-10 05:50:54ZackerySpytzsetkeywords: + patch
stage: test needed -> patch review
pull_requests: + pull_request8206
2018-03-28 09:07:54serhiy.storchakasetmessages: + msg314579
2018-03-28 07:14:34ned.deilysetpriority: normal -> critical
nosy: + belopolsky, amaury.forgeotdarc, meador.inge, serhiy.storchaka, ned.deily
messages: + msg314567

2018-02-01 20:51:07eryksunsetversions: + Python 3.7, Python 3.8
nosy: + eryksun

messages: + msg311468

type: behavior
stage: test needed
2018-02-01 19:54:50thellercreate