Issue 24278: Docs on Parsing arguments should say something about mem mgmt for formatters returning C strings

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/68466

classification

Title:	Docs on Parsing arguments should say something about mem mgmt for formatters returning C strings
Type:		Stage:	resolved
Components:	Documentation	Versions:	Python 3.6, Python 3.5

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	docs@python	Nosy List:	asvetlov, blais, docs@python, martin.panter, python-dev
Priority:	normal	Keywords:	patch

Created on 2015-05-24 15:27 by blais, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
buffer-convert.patch	martin.panter, 2015-05-26 08:10		review

Messages (12)
msg243987 - (view)	Author: Martin Blais (blais) *	Date: 2015-05-24 15:27
Functions that parse arguments like PyArg_ParseTupleAndKeywords() have several formatters that fill in C strings, e.g. "s". In the C API doc: https://docs.python.org/3.5/c-api/arg.html#c.PyArg_ParseTupleAndKeywords There should be an explicit mention of whether this memory needs to be free (in this case: No) and if not, how this memory is managed (in this case: This refers to a buffer managed by the string object itself). Because of questions of encoding, it raises questions where this memory lies, and what its lifetime is (in this case: That of the owning string object from the caller). This deserves an explicit mention, even if brief.
msg243989 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2015-05-24 15:35
Would you propose a patch for docs?
msg243990 - (view)	Author: Martin Blais (blais) *	Date: 2015-05-24 16:07
I don't think I'm the right person to propose a patch for this; I'm just guessing how it works so far, I haven't had time to look into the source code in detail, I'm sure there's a lot more context to it.
msg244006 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-05-24 22:40
At the top of the list <https://docs.python.org/3.5/c-api/arg.html#strings-and-buffers>, it says which cases have to be freed or not, and also mentions releasing buffers. Are you proposing to add this information to each entry in the list as well? Or just mention how the pointer lifetime relates to the Python object lifetime at the top? I think you will find most of the cases are the same, except for the ones that explicitly allocate extra memory.
msg244032 - (view)	Author: Martin Blais (blais) *	Date: 2015-05-25 12:32
Adding information that tells developers where the memory for those returned areas is stored and as you mention, its lifetime guarantees w.r.t. to the Python object, would go a long way towards making this more clear. The questions that immediately came to my mind were: - Is this memory attached to the object? - What if there is a conversion... is it still attached to the object? The converter for "s" says "Unicode objects are converted to C strings using 'utf-8' encoding." Where is the output of this conversion stored? Does it have the same lifetime as its PyObject as well or does it use a cache of recent conversions (e.g. like re/struct), or just static storage? And if so, is it thread-safe? I can find all these answers by looking at the source code for C/Python, or I can _guess_ that extra data is attached to some sort of 'extra' field in a PyObject (which would be a sensible thing to do), but my point is that an API user shouldn't have to dig in the source or have to guess for such important concerns. I think we should be a bit more transparent in the docs.
msg244033 - (view)	Author: Martin Blais (blais) *	Date: 2015-05-25 13:05
Oh, and yes, just adding this info at the top would be fine IMO. It shouldn't have to be repeated.
msg244083 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-05-26 05:53
This is my understanding of where the buffers would be managed and what governs their lifetimes, though I haven’t analyzed the code to verify: s, z: str -> UTF-8 cache -> pointer for object lifetime s, z: str -> UTF-8 cache -> buffer; bytes-like -> buffer; until release y: bytes-like -> buffer until release S: bytes borrowed object Y: bytearray borrowed object u, u#, Z, Z#: str -> Py_UNICODE cache -> pointer for object lifetime U: str borrowed object w: writable bytes-like -> buffer until release es, es#: str -> encoded -> pointer until free (or pre-allocated for es#) et, et#: str -> encoded -> pointer; bytes, bytearray -> pointer; until free (or pre-allocated for et#) One open question that has worried me a bit is the “s#”, “z#”, “y”, “y#” codes, which are documented as requiring immutable bytes-like objects, but do not return a buffer structure. I guess this is designed for objects like bytes(), where the pointer would remain valid for the object’s lifetime, even if it has been released according to the buffer protocol. But how is this guaranteed for arbitrary buffer objects? Some undocumented requirement of the buffer’s “readonly” flag perhaps? So I propose to add: * Lifetime of all Py_buffer return values is until PyBuffer_Release() is called * Unless otherwise documented, for conversions that return pointers to buffers, the buffer is managed or cached inside the Python object, and the lifetime of that buffer is the same as the whole Python object * Unconverted Python objects are borrowed references * For the four immutable bytes-like conversions I mentioned that return pointers, the buffer management and lifetime is not documented (unless somebody comes up with a better explanation)
msg244091 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-05-26 08:10
Here is a patch. Looking at the code, s#, z#, y and y# are the conversions that call convertbuffer(). These require that the object does not have a PyBufferProcs.bf_releasebuffer() method, which guarantees that the buffer’s lifetime is the same as the underlying object. In the patch I also removed a contradictory notice about nulls with the “u” conversion.
msg244984 - (view)	Author: Martin Blais (blais) *	Date: 2015-06-08 02:06
LGTM! Thanks for making the change.
msg271875 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-08-03 06:32
Will commit this soon, apart from dropping “such as bytearray” for s#, which I don’t remember the reasoning.
msg271935 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-08-04 01:56
New changeset ad7da726bea6 by Martin Panter in branch '3.5': Issue #24278: Explain how argument parsing output buffers are managed https://hg.python.org/cpython/rev/ad7da726bea6 New changeset 49c318c6294e by Martin Panter in branch 'default': Issue #24278: Merge argument parsing docs from 3.5 https://hg.python.org/cpython/rev/49c318c6294e
msg272111 - (view)	Author: Martin Blais (blais) *	Date: 2016-08-07 04:18
Thank you Martin!

History
Date	User	Action	Args
2022-04-11 14:58:17	admin	set	github: 68466
2016-08-07 04:18:45	blais	set	messages: + msg272111
2016-08-04 02:55:03	martin.panter	link	issue24082 superseder
2016-08-04 02:53:17	martin.panter	set	status: open -> closed resolution: fixed stage: commit review -> resolved
2016-08-04 01:56:08	python-dev	set	nosy: + python-dev messages: + msg271935
2016-08-03 06:32:58	martin.panter	set	stage: patch review -> commit review messages: + msg271875 versions: - Python 3.4
2015-06-08 02:06:16	blais	set	messages: + msg244984
2015-05-26 08:10:40	martin.panter	set	files: + buffer-convert.patch versions: + Python 3.4, Python 3.5, Python 3.6 messages: + msg244091 keywords: + patch stage: needs patch -> patch review
2015-05-26 05:53:04	martin.panter	set	messages: + msg244083
2015-05-25 13:05:35	blais	set	messages: + msg244033
2015-05-25 12:32:10	blais	set	messages: + msg244032
2015-05-24 22:40:22	martin.panter	set	nosy: + martin.panter messages: + msg244006
2015-05-24 16:07:56	blais	set	messages: + msg243990
2015-05-24 15:35:55	asvetlov	set	nosy: + asvetlov messages: + msg243989 stage: needs patch
2015-05-24 15:27:38	blais	create