classification
Title: PyObject_AsCharBuffer() should only accept read-only objects
Type: Stage:
Components: Interpreter Core Versions: Python 3.2
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: kermode, lemburg, loewis, pitrou, vstinner
Priority: normal Keywords: patch

Created on 2010-08-14 13:42 by vstinner, last changed 2010-09-29 05:18 by kermode. This issue is now closed.

Files
File name Uploaded Description Edit
PyObject_AsCharBuffer-2.7.patch vstinner, 2010-08-14 13:42
PyObject_AsReadBuffer.c kermode, 2010-09-28 16:45 Code snippet from abstract.c for Python 3.2a2 r84925
Messages (13)
msg113895 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-14 13:42
mmap, buffer, bytearray, string and unicode objects set the char buffer callback (bf_getcharbuffer). The bytearray object sets also the release buffer callback (bf_releasebuffer).

In Python 2.7, PyObject_AsCharBuffer() accepts bytearray objects, whereas the "t#" format of PyArg_Parse functions rejects byte bytearray objects (expect an "pinned buffer object").

In Python 3.2, PyObject_AsCharBuffer() releases the buffer.

PyObject_AsCharBuffer() documentation (in 2.7 and 3.2) says that the function only accepts read-only objects.

Something is wrong here. If the caller doesn't hold a kind of lock, the object cannot be protected against futher modifications. The caller has to ensure that the object is not modifiable until it finishs to use the char* pointer.

I think that it would be safer to respect the documentation: PyObject_AsCharBuffer() should only accept read-only objects. The most important change is that functions using PyObject_AsCharBuffer() will not accept bytearray objects anymore.

Attached patch (for Python 2.7) changes PyObject_AsCharBuffer() to reject modifiable objects. It removes also the character buffer callback from the bytearray type. To avoid breaking compatibility too much, I patched int(), long() and float() to still support bytearray objects.

Examples of functions rejecting bytearray with the patch:
 - int(), long(), float(), complex()
 - many str methods: split, partition, rpartition, rsplit, index, find, count, translate, replace, startswith, endswith
 - writelines() of file objects (eg. sys.stdout.writelines)
 - writelines() method of a bz2 file

--

My patch breaks backward compatibility, and I don't know that it is acceptable in Python 2.7.

I will write a similar patch for Python 3.2.
msg113898 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-14 13:59
I don't think we should change anything in 2.7 at this point. It risks breaking compatibility while we are at the end of the 2.x line, for little added benefit (the old buffer API has always been insecure with mutable buffers).

As for 3.2, PyObject_AsCharBuffer() should IMO be deprecated, as well as any other functions meant to query the old buffer API.
msg114046 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-08-16 14:51
STINNER Victor wrote:
> 
> New submission from STINNER Victor <victor.stinner@haypocalc.com>:
> 
> mmap, buffer, bytearray, string and unicode objects set the char buffer callback (bf_getcharbuffer). The bytearray object sets also the release buffer callback (bf_releasebuffer).
> 
> In Python 2.7, PyObject_AsCharBuffer() accepts bytearray objects, whereas the "t#" format of PyArg_Parse functions rejects byte bytearray objects (expect an "pinned buffer object").
> 
> In Python 3.2, PyObject_AsCharBuffer() releases the buffer.
> 
> PyObject_AsCharBuffer() documentation (in 2.7 and 3.2) says that the function only accepts read-only objects.
> 
> Something is wrong here. If the caller doesn't hold a kind of lock, the object cannot be protected against futher modifications. The caller has to ensure that the object is not modifiable until it finishs to use the char* pointer.
> 
> I think that it would be safer to respect the documentation: PyObject_AsCharBuffer() should only accept read-only objects. The most important change is that functions using PyObject_AsCharBuffer() will not accept bytearray objects anymore.
> 
> Attached patch (for Python 2.7) changes PyObject_AsCharBuffer() to reject modifiable objects. It removes also the character buffer callback from the bytearray type. To avoid breaking compatibility too much, I patched int(), long() and float() to still support bytearray objects.
> 
> Examples of functions rejecting bytearray with the patch:
>  - int(), long(), float(), complex()
>  - many str methods: split, partition, rpartition, rsplit, index, find, count, translate, replace, startswith, endswith
>  - writelines() of file objects (eg. sys.stdout.writelines)
>  - writelines() method of a bz2 file
> 
> --
> 
> My patch breaks backward compatibility, and I don't know that it is acceptable in Python 2.7.

Simple answer: no.

Long answer: The caller is responsible for making sure that the object
is not modified while in use.

> I will write a similar patch for Python 3.2.

Restricting the API to read-only buffers would seriously limit
it's functionality. I'm -1 on doing that.

Instead, it's better to clarify the documentation and mention the
fact that the used object may not change during use.

Note that the buffer interface release API is meant to protect
against such modifications, so I don't see why rejecting objects
that do implement this API should be rejected. It's object that
don't implement the release buffer slot which you'd have to worry
about. Then again, this has never really been an issue in practice
during the 10 years of the 2.x branch, so I wouldn't call this
a serious issue.

See PEP 3118... "All that is specifically required by the exporter, however, is to ensure that any
memory shared through the bufferinfo structure remains valid until releasebuffer is called on the
bufferinfo structure exporting that memory."
msg114053 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-16 17:02
> Note that the buffer interface release API is meant to protect
> against such modifications, so I don't see why rejecting objects
> that do implement this API should be rejected.

As I explained, the release API is *not* used by PyObject_AsCharBuffer() in Python 2.7 and 3.2.

Pseudo-code example:
---
PyObject_AsCharBuffer(obj, &str, &size)
... modify or destroy obj ...
str is no more valid here
---

> Restricting the API to read-only buffers would seriously limit
> it's functionality. I'm -1 on doing that.

PyObject_AsCharBuffer() is dangerous because the caller has to ensure that the object is not modified or destroyed. Antoine proposes to deprecated PyObject_AsCharBuffer().

PyObject_GetBuffer() can replace PyObject_AsCharBuffer(): it's easy to get the pointer to the buffer content (view.buf) and the size (view.len) using PyObject_GetBuffer(), and it does protect the buffer against modification or destuction thanks to the release API (PyBuffer_Release). But PyObject_GetBuffer() is maybe a little bit to low level, eg. it doesn't check that the buffer is contiguous, and it requires a flag argument. A new function is maybe needed to replace PyObject_AsCharBuffer(). Eg. PyObject_GetCharBuffer() which will call PyObject_GetBuffer() (the caller will then have to call PyBuffer_Release() to release the buffer).

Example:
---
PyObject_GetCharBuffer(obj, &view, &str, &size)
... use str and size ...
PyBuffer_Release(view);
---
or just
---
PyObject_GetCharBuffer(obj, &view)
... use view.buf and view.len ...
PyBuffer_Release(view);
---
msg114130 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-08-17 15:42
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
>> Note that the buffer interface release API is meant to protect
>> against such modifications, so I don't see why rejecting objects
>> that do implement this API should be rejected.
> 
> As I explained, the release API is *not* used by PyObject_AsCharBuffer() in Python 2.7 and 3.2.
> 
> Pseudo-code example:
> ---
> PyObject_AsCharBuffer(obj, &str, &size)
> ... modify or destroy obj ...
> str is no more valid here
> ---

Right, but as I explained before: this doesn't really happen in
practice, otherwise we would have had issues with these APIs
long before the Py_buffers were introduced.

Note that the same comment applies to PyObject_AsReadBuffer() and
PyObject_AsWriteBuffer().

For Python 2.7 you can't change anything anymore. For Python 3.2
you could start a deprecation process as outlined in PEP 4, if you
feel this really is such a big issue.

>> Restricting the API to read-only buffers would seriously limit
>> it's functionality. I'm -1 on doing that.
> 
> PyObject_AsCharBuffer() is dangerous because the caller has to ensure that the object is not modified or destroyed. Antoine proposes to deprecated PyObject_AsCharBuffer().
> 
> PyObject_GetBuffer() can replace PyObject_AsCharBuffer(): it's easy to get the pointer to the buffer content (view.buf) and the size (view.len) using PyObject_GetBuffer(), and it does protect the buffer against modification or destuction thanks to the release API (PyBuffer_Release). But PyObject_GetBuffer() is maybe a little bit to low level, eg. it doesn't check that the buffer is contiguous, and it requires a flag argument. A new function is maybe needed to replace PyObject_AsCharBuffer(). Eg. PyObject_GetCharBuffer() which will call PyObject_GetBuffer() (the caller will then have to call PyBuffer_Release() to release the buffer).
> 
> Example:
> ---
> PyObject_GetCharBuffer(obj, &view, &str, &size)
> ... use str and size ...
> PyBuffer_Release(view);
> ---
> or just
> ---
> PyObject_GetCharBuffer(obj, &view)
> ... use view.buf and view.len ...
> PyBuffer_Release(view);
> ---

Why add a new function ? Python 3.2 doesn't provide a way to access
a *character* buffer version of an object anymore. Not that this is
good, or shouldn't be readded at some point, but right now, we don't
have a need for such an API.
msg117502 - (view) Author: Lenard Lindstrom (kermode) Date: 2010-09-28 03:11
PEP 3118

"""
Rationale

...

3) There is no way for a consumer to tell the buffer-API-exporting object it is "finished" with its view of the memory and therefore no way for the exporting object to be sure that it is safe to reallocate the pointer to the memory that it owns (for example, the array object reallocating its memory after sharing it with the buffer object which held the original pointer led to the infamous buffer-object problem).
"""


Let's consider Pygame, and the SDL surface it wraps as a pygame.Surface. Pygame exposes a surface's data through the buffer protocol for manipulation by a NumPy array. Now some SDL surfaces need locking before accessing, so Pygame will lock the surface for the lifetime of the consumer array. How is the surface unlocked when the array is reclaimed? By a buffer proxy that is owned by the array. The proxy unlocks the surface when the proxy is garbage collected. The new buffer protocol suggests a simpler way, equivalent to contexts, but only if the buffer release mechanism works as advertised. This promise is broken when a consumer calls PyBuffer_Release before it is done with the buffer.

All the questionable buffer api functions use the PyBUF_SIMPLE access flag. So I suggest a PyBUF_SIMPLE access request requires a returned buffer to remain valid for the lifetime of the exporter, not the Py_buffer view. If an exporter cannot honor this request, due to locking requirements for instance, then it raises an exception. All other access request flags promise that PyBuffer_Release, and therefore bf_releasebuffer, is called only when the buffer is released, not before.
msg117511 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-28 11:49
Le mardi 28 septembre 2010 à 03:11 +0000, Lenard Lindstrom a écrit :
> 
> Let's consider Pygame, and the SDL surface it wraps as a
> pygame.Surface. Pygame exposes a surface's data through the buffer
> protocol for manipulation by a NumPy array. Now some SDL surfaces need
> locking before accessing, so Pygame will lock the surface for the
> lifetime of the consumer array. How is the surface unlocked when the
> array is reclaimed? By a buffer proxy that is owned by the array. The
> proxy unlocks the surface when the proxy is garbage collected. The new
> buffer protocol suggests a simpler way, equivalent to contexts, but
> only if the buffer release mechanism works as advertised. This promise
> is broken when a consumer calls PyBuffer_Release before it is done
> with the buffer.

I don't know why you're saying that. The purpose of PyBuffer_Release is
precisely to solve these kinds of use cases (where you want timely
release of a resource rather than rely on the garbage collector).

(you shouldn't even need a proxy anymore; just make the surface object
expose the buffer API, with bf_getbuffer() acquiring the lock and
bf_releasebuffer() releasing the lock. If you want to support several
buffer exports at once, just maintain some internal count of them)
msg117537 - (view) Author: Lenard Lindstrom (kermode) Date: 2010-09-28 16:45
>I don't know why you're saying that. The purpose of PyBuffer_Release is
>precisely to solve these kinds of use cases (where you want timely
>release of a resource rather than rely on the garbage collector).

Yes, I was unclear. This refers to Python 3.2, not the 2.x series. PyObject_AsReadBuffer (defined at line 270 in abstract.c, code of routine attached) calls bf_getbuffer with the PyBUF_SIMPLE flag to retrieve a bytes buffer. It then calls bf_releasebuffer before returning the buffer to the caller. PyObject_AsCharBuffer and PyObject_AsWriteBuffer do the same. It is not be exactly the same issue discussed so far, but is closely related.

Deprecating PyObject_AsReadBuffer is extreme, and doesn't solve the early release problem. Redefining the PyBUF_SIMPLE flag to be like the old buffer protocol warns those implementing a new buffer interface to  avoid processing PyBUF_SIMPLE requests when locking is required.
msg117546 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-28 20:24
> Yes, I was unclear. This refers to Python 3.2, not the 2.x series.
> PyObject_AsReadBuffer (defined at line 270 in abstract.c, code of
> routine attached) calls bf_getbuffer with the PyBUF_SIMPLE flag to
> retrieve a bytes buffer. It then calls bf_releasebuffer before
> returning the buffer to the caller. PyObject_AsCharBuffer and
> PyObject_AsWriteBuffer do the same. It is not be exactly the same
> issue discussed so far, but is closely related.
> 
> Deprecating PyObject_AsReadBuffer is extreme,

I disagree. PyObject_As*Buffer functions are remnants of the old buffer
API in Python 2.x. They are here only to ease porting of existing C
code, but carefully written 3.x code should switch to
PyObject_GetBuffer() (or one of the dedicated typecodes in
PyArg_ParseTuple: "y*", "w*", etc.).

> Redefining the PyBUF_SIMPLE flag to be like the old buffer protocol
> warns those implementing a new buffer interface to  avoid processing
> PyBUF_SIMPLE requests when locking is required.

Well, the new buffer API was designed precisely because the old API
wasn't appropriate, so your proposal to revive the old API doesn't sound
very compelling, to say the least.

To restate things a bit more clearly: you should use
PyObject_GetBuffer(), not any of the PyObject_As*Buffer functions.
Perhaps you should explain why you care about the latter rather than the
former.
msg117550 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-28 21:31
Le mardi 28 septembre 2010 22:24:56, vous avez écrit :
> I disagree. PyObject_As*Buffer functions are remnants of the old buffer
> API in Python 2.x. They are here only to ease porting of existing C
> code, but carefully written 3.x code should switch to
> PyObject_GetBuffer() (or one of the dedicated typecodes in
> PyArg_ParseTuple: "y*", "w*", etc.).

Do you mean that PyObject_As*Buffer functions are deprecated? The documentation 
doesn't tell that the new buffer API should be use instead.
msg117551 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-28 21:35
> Le mardi 28 septembre 2010 22:24:56, vous avez écrit :
> > I disagree. PyObject_As*Buffer functions are remnants of the old buffer
> > API in Python 2.x. They are here only to ease porting of existing C
> > code, but carefully written 3.x code should switch to
> > PyObject_GetBuffer() (or one of the dedicated typecodes in
> > PyArg_ParseTuple: "y*", "w*", etc.).
> 
> Do you mean that PyObject_As*Buffer functions are deprecated? The documentation 
> doesn't tell that the new buffer API should be use instead.

Well, the documentation should be updated then. Given that the old
buffer API doesn't exist anymore, the old functions don't have any
practical purpose except for ease of porting code.
msg117563 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-28 23:43
Ok, 3.x documentation is fixed to recommend the new buffer API funcs over the old compatibility funcs. On Victor's private suggestion, I'm now closing the issue.
msg117575 - (view) Author: Lenard Lindstrom (kermode) Date: 2010-09-29 05:18
That is perfectly fine with me.
History
Date User Action Args
2010-09-29 05:18:41kermodesetmessages: + msg117575
2010-09-28 23:43:42pitrousetstatus: open -> closed
resolution: wont fix
messages: + msg117563
2010-09-28 21:35:01pitrousetmessages: + msg117551
2010-09-28 21:31:15vstinnersetmessages: + msg117550
2010-09-28 20:24:54pitrousetmessages: + msg117546
2010-09-28 16:45:57kermodesetfiles: + PyObject_AsReadBuffer.c

messages: + msg117537
2010-09-28 11:49:08pitrousetmessages: + msg117511
2010-09-28 03:11:42kermodesetnosy: + kermode
messages: + msg117502
2010-08-20 18:44:35terry.reedysetversions: - Python 2.7
2010-08-17 15:42:58lemburgsetmessages: + msg114130
2010-08-16 17:02:31vstinnersetmessages: + msg114053
2010-08-16 14:51:22lemburgsetnosy: + lemburg
messages: + msg114046
2010-08-14 13:59:12pitrousetnosy: + loewis
messages: + msg113898
2010-08-14 13:42:30vstinnercreate