New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs on Parsing arguments should say something about mem mgmt for formatters returning C strings #68466
Comments
Functions that parse arguments like PyArg_ParseTupleAndKeywords() have several formatters that fill in C strings, e.g. "s". In the C API doc: There should be an explicit mention of whether this memory needs to be free (in this case: No) and if not, how this memory is managed (in this case: This refers to a buffer managed by the string object itself). Because of questions of encoding, it raises questions where this memory lies, and what its lifetime is (in this case: That of the owning string object from the caller). This deserves an explicit mention, even if brief. |
Would you propose a patch for docs? |
I don't think I'm the right person to propose a patch for this; I'm just guessing how it works so far, I haven't had time to look into the source code in detail, I'm sure there's a lot more context to it. |
At the top of the list <https://docs.python.org/3.5/c-api/arg.html#strings-and-buffers\>, it says which cases have to be freed or not, and also mentions releasing buffers. Are you proposing to add this information to each entry in the list as well? Or just mention how the pointer lifetime relates to the Python object lifetime at the top? I think you will find most of the cases are the same, except for the ones that explicitly allocate extra memory. |
Adding information that tells developers where the memory for those returned areas is stored and as you mention, its lifetime guarantees w.r.t. to the Python object, would go a long way towards making this more clear. The questions that immediately came to my mind were:
I can find all these answers by looking at the source code for C/Python, or I can _guess_ that extra data is attached to some sort of 'extra' field in a PyObject (which would be a sensible thing to do), but my point is that an API user shouldn't have to dig in the source or have to guess for such important concerns. I think we should be a bit more transparent in the docs. |
Oh, and yes, just adding this info at the top would be fine IMO. It shouldn't have to be repeated. |
This is my understanding of where the buffers would be managed and what governs their lifetimes, though I haven’t analyzed the code to verify: s, z: str -> UTF-8 cache -> pointer for object lifetime One open question that has worried me a bit is the “s#”, “z#”, “y”, “y#” codes, which are documented as requiring immutable bytes-like objects, but do not return a buffer structure. I guess this is designed for objects like bytes(), where the pointer would remain valid for the object’s lifetime, even if it has been released according to the buffer protocol. But how is this guaranteed for arbitrary buffer objects? Some undocumented requirement of the buffer’s “readonly” flag perhaps? So I propose to add:
|
Here is a patch. Looking at the code, s#, z#, y and y# are the conversions that call convertbuffer(). These require that the object does not have a PyBufferProcs.bf_releasebuffer() method, which guarantees that the buffer’s lifetime is the same as the underlying object. In the patch I also removed a contradictory notice about nulls with the “u” conversion. |
LGTM! Thanks for making the change. |
Will commit this soon, apart from dropping “such as bytearray” for s#, which I don’t remember the reasoning. |
New changeset ad7da726bea6 by Martin Panter in branch '3.5': New changeset 49c318c6294e by Martin Panter in branch 'default': |
Thank you Martin! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: