classification
Title: Document "suboffsets if needed" in 2.7
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.5, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, pitrou, python-dev, rhansen, seberg, skrah
Priority: low Keywords: patch

Created on 2015-01-30 01:33 by rhansen, last changed 2015-02-02 17:05 by skrah. This issue is now closed.

Files
File name Uploaded Description Edit
PyBuffer_IsContiguous.patch rhansen, 2015-01-30 01:33 patch against 2.7 that fixes this bug
PyBuffer_IsContiguous_v2.patch rhansen, 2015-01-30 05:45 patch v2 (against master) that fixes this bug review
doc_c-api_buffer.patch rhansen, 2015-02-01 08:14 documentation patch based on msg235141 review
Messages (23)
msg235014 - (view) Author: Richard Hansen (rhansen) * Date: 2015-01-30 01:33
According to https://docs.python.org/2/c-api/buffer.html#the-new-style-py-buffer-struct if the suboffsets member of Py_buffer is non-NULL and all members of the array are negative, the buffer may be contiguous.

PyBuffer_IsContiguous() does not behave this way.  It assumes that if the suboffsets member is non-NULL then at least one member of the array is non-negative, and thus assumes that the buffer is non-contiguous.  This is not always correct.

One consequence of this bug is PyBuffer_ToContiguous() (and thus memoryview.tobytes()) falls back to slow (and currently buggy, see issue #23349) memory copying code.

Attached is a patch that fixes this bug.  The patch is against 2.7 but can be trivially modified to apply to 3.x.
msg235016 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-01-30 01:39
The patch has an obvious syntax error :-)
Other than that, adding a comment would be nice. Bonus points if you can write a test (3.x has infrastructure for that, see Lib/test/test_buffer.py).
msg235017 - (view) Author: Richard Hansen (rhansen) * Date: 2015-01-30 01:42
> The patch has an obvious syntax error :-)

Doh!

> Other than that, adding a comment would be nice.

Agreed, will do.

> Bonus points if you can write a test (3.x has infrastructure for that, see Lib/test/test_buffer.py).

I'll take a look.  Thanks!
msg235020 - (view) Author: Richard Hansen (rhansen) * Date: 2015-01-30 05:45
I've attached a new version of the patch.  Suggestions for simplifying the test code would be appreciated.
msg235032 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-01-30 13:10
PEP-3118 says:

Py_buffer.suboffsets:

"If all suboffsets are negative (i.e. no de-referencing is needed, then this must be NULL (the default value)."


I would be inclined to go with the PEP here and make a doc fix instead.
msg235034 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-01-30 14:29
For the record: I'm myself guilty of accepting all-negative suboffsets
in memoryview.c (see init_slice()) and I think there's even a test for it.

However, while it's ok for a specific function to accept this
corner case, I'm not sure about is_contiguous().


People might rely on the fact that contiguous implies
suboffsets==NULL.


Sebastian, did this special case ever come up in the context of
NumPy?
msg235052 - (view) Author: Richard Hansen (rhansen) * Date: 2015-01-30 20:10
> People might rely on the fact that contiguous implies suboffsets==NULL.

Cython (currently) relies on all-negatives being acceptable and equivalent to suboffsets==NULL.  See:
https://github.com/cython/cython/pull/367
http://thread.gmane.org/gmane.comp.python.cython.devel/15569
msg235053 - (view) Author: Sebastian Berg (seberg) * Date: 2015-01-30 20:36
Numpy does not understand suboffsets. The buffers we create will always have them NULL. The other way around.... To be honest, think it is probably ignoring the whole fact that they might exist at all :/, really needs to be fixed if it is the case.
msg235107 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-01-31 13:03
Cython doesn't follow the spec though (use Python 3):

from _testbuffer import *

cpdef foo():
    cdef unsigned char[:] v = bytearray(b"testing")
    nd = ndarray(v, getbuf=PyBUF_ND)
    print(nd.suboffsets)
    nd = ndarray(v, getbuf=PyBUF_FULL)
    print(nd.suboffsets)


Cython hands out suboffsets even for a PyBUF_ND request, which is
definitely wrong.  For a PyBUF_FULL request they should only be
provided *if needed*.
msg235111 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-01-31 14:53
To summarize, I think this is different from #22445, which also requests
relaxed contiguity tests. #22445 is motivated by the fact that slicing
can create a certain class of contiguous buffers that aren't recognized
as such.

No such reason exists here: All-negative suboffsets are only created
if a buffer provider chooses to hand them out instead of NULL. To my
knowledge only Cython does this, and it's against the PEP (though
strictly speaking not against the docs).


So I'm -0.5 on this change in general and -1 for 2.7.



I'd reject the issue, but let's wait for Antoine's opinion.
msg235122 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-01-31 17:53
Prepare for a partial reversal of opinion. :)


Indexing *can* actually create all-negative suboffsets in a natural way:


>>> from _testbuffer import *
>>> nd = ndarray([1,2,3,4,5,6,7,8,9,10,11,12], shape=[3,4], flags=ND_PIL)
>>> nd.tolist()
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
>>> 
>>> x = nd[1]
>>> x.tolist()
[5, 6, 7, 8]
>>> x.suboffsets
(-1,)
>>> 


This leaves me +-0 for the change, with the caveat that applications
might break.
msg235125 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-01-31 18:51
I don't have an opinion on this. I've never seen suboffsets in use; but it seems reasonable to detect a "dummy suboffsets" array and recognize it as contiguous.
msg235141 - (view) Author: Richard Hansen (rhansen) * Date: 2015-02-01 02:29
(The following message is mostly off-topic but I think it is relevant to those interested in this issue.  This message is about the clarity of the documentation regarding flag semantics, and what I think the flags should mean.)

> Cython doesn't follow the spec though (use Python 3):
> 
>     from _testbuffer import *
>     cpdef foo():
>         cdef unsigned char[:] v = bytearray(b"testing")
>         nd = ndarray(v, getbuf=PyBUF_ND)
>         print(nd.suboffsets)
>         nd = ndarray(v, getbuf=PyBUF_FULL)
>         print(nd.suboffsets)

When I compile and run the above (latest Cython from Git master), I get:

    ()
    ()

Looking at the Python source code (Modules/_testbuffer.c ndarray_get_suboffsets() and ssize_array_as_tuple()) the above output is printed if the suboffsets member is NULL.

> Cython hands out suboffsets even for a PyBUF_ND request, which is
> definitely wrong.  For a PyBUF_FULL request they should only be
> provided *if needed*.

suboffsets appears to be NULL in both cases in your example, which seems acceptable to me given:
  * the user didn't request suboffsets information in the PyBUF_ND
    case, and
  * there are no indirections in the PyBUF_FULL case.

Even if suboffsets wasn't NULL, I believe the behavior would still be correct for the PyBUF_ND case.  My reading of <https://docs.python.org/3/c-api/buffer.html> is that flags=PyBUF_ND implies that shape member MUST NOT be NULL, but strides and suboffsets can be NULL or not (it doesn't matter).  This interpretation of mine is due to the sentence, "Note that each flag contains all bits of the flags below it."  If flags=PyBUF_ND meant that strides and suboffsets MUST be NULL, then PyBUF_ND and PyBUF_STRIDES would necessarily be mutually exclusive and flags=(PyBUF_ND|PyBUF_STRIDES) would not make sense (the strides member would have to be both NULL and non-NULL).

If (flags & PyBUF_INDIRECT) is false, then the consumer is not interested in the suboffsets member so it shouldn't matter what it points to.  (And the consumer should not dereference the pointer in case it points to a junk address.)

IMHO, if the buffer is C-style contiguous then the producer should be allowed to populate the shape, strides, and suboffsets members regardless of whether or not any of the PyBUF_ND, PyBUF_STRIDES, or PyBUF_INDIRECT flags are set.  In other words, for C-style contiguous buffers, the producer should be allowed to act as if PyBUF_INDIRECT was always provided because the consumer will always get an appropriate Py_buffer struct regardless of the actual state of the PyBUF_ND, PyBUF_STRIDES, and PyBUF_INDIRECT flags.

It *is* a bug, however, to dereference the strides or suboffsets members with ndarray(v, getbuf=PyBUF_ND) because the consumer didn't ask for strides or suboffsets information and the pointers might be bogus.

Stepping back a bit, I believe that the flags should be thought of as imposing requirements on the producer.  I think those requirements should be (assuming ndim > 0):

  * PyBUF_ND:  If (flags & PyBUF_ND) is true:
      - If (flags & PyBUF_STRIDES) is false *and* the producer is
        unable to produce a block of memory at [buf,buf+len)
        containing all (len/itemsize) entries in a C-style contiguous
        chunk, then the producer must raise an exception.
      - Otherwise, the producer must provide the shape of buffer.
    If (flags & PyBUF_ND) is false:
      - If the producer is unable to produce a contiguous chunk of
        (len/itemsize) entries (of any shape) in the block of memory
        at [buf,buf+len), then the producer must raise an exception.
      - Otherwise, the producer is permitted to do any of the
        following:
          + don't touch the shape member (don't set it to NULL or any
            other value; just leave it as-is)
          + set the shape member to NULL
          + set the shape member to point to an array describing the
            shape
          + set the shape member to point to any other location, even
            if dereferencing the pointer would result in a segfault
  * PyBUF_STRIDES:  If (flags & PyBUF_STRIDES) is true:
      - The producer must provide the appropriate strides information.
    If (flags & PyBUF_STRIDES) is false:
      - If the producer is unable to produce a block of memory at
        [buf,buf+len) containing all (len/itemsize) entries, the
        producer must raise an exception.
      - Otherwise, the producer is permitted to do any of the
        following;
          + don't touch the strides member (don't set it to NULL or
            any other value; just leave it as-is)
          + set the strides member to NULL
          + set the strides member to point to an array describing the
            strides
          + set the strides member to point to any other location,
            even if dereferencing the pointer would result in a
            segfault
  * PyBUF_INDIRECT:  If (flags & PyBUF_INDIRECT) is true:
      - If the buffer uses indirections then the producer must set the
        suboffsets member to point to an array with appropriate
        entries.
      - Otherwise, the producer can either set the suboffsets member
        to NULL or set it to point to an array of all negative
        entries.
    If (flags & PyBUF_INDIRECT) is false:
      - If the producer cannot produce a buffer that does not have
        indirections, then the producer must raise an exception.
      - Otherwise, the producer is permitted to do any of the
        following:
          + don't touch the suboffsets member (don't set it to NULL or
            any other value; just leave it as-is)
          + set the suboffsets member to NULL
          + set the suboffsets member to point to an array of all
            negative entries
          + set the suboffsets member to point to any other location,
            even if dereferencing the pointer would result in a
            segfault
msg235145 - (view) Author: Richard Hansen (rhansen) * Date: 2015-02-01 07:57
Attached is a documentation patch that adds what I said in msg235141.  I doubt everyone will agree with the changes, but maybe it will be a useful starting point.

(Despite not having an asterisk next to my username, I have signed the contributor agreement.  It just hasn't been processed yet.)
msg235146 - (view) Author: Richard Hansen (rhansen) * Date: 2015-02-01 08:45
> This leaves me +-0 for the change, with the caveat that applications
> might break.

How might an application break with this change?  Compared to the current PyBuffer_IsContiguous(), the patched version is the same except it returns true for a wider range of actually-contiguous buffers.
msg235148 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-02-01 08:59
> Richard Hansen added the comment:
> > Cython doesn't follow the spec though (use Python 3):
> > 
> >     from _testbuffer import *
> >     cpdef foo():
> >         cdef unsigned char[:] v = bytearray(b"testing")
> >         nd = ndarray(v, getbuf=PyBUF_ND)
> >         print(nd.suboffsets)
> >         nd = ndarray(v, getbuf=PyBUF_FULL)
> >         print(nd.suboffsets)
> 
> When I compile and run the above (latest Cython from Git master), I get:
> 
>     ()
>     ()

With Cython version 0.20.1post0 I get:

>>> foo.foo()
(-1,)
(-1,)

If you get the correct output from the latest Cython, it looks like this
issue has been fixed.
msg235150 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-02-01 09:10
> How might an application break with this change?

func1:
   if (!PyBuffer_IsContiguous(view, 'C'))
       error();

   func2(view);

func2:
   assert(view->suboffsets == NULL);
   ...

Yes, I *did* insert sanity checks like this in some places.
msg235151 - (view) Author: Richard Hansen (rhansen) * Date: 2015-02-01 09:21
>> When I compile and run the above (latest Cython from Git master), I
>> get:
>> 
>>     ()
>>     ()
> 
> With Cython version 0.20.1post0 I get:
> 
> >>> foo.foo()
> (-1,)
> (-1,)
> 
> If you get the correct output from the latest Cython, it looks like
> this issue has been fixed.

Oops, I was running a locally modified version of Cython that contained a patch meant to work around issue #23349.

When I run the *actual* upstream master I get the same behavior that you do.

But either way, I don't see why it's a problem that it prints (-1,) for the PyBUF_ND case.  The consumer didn't request suboffsets information so I would expect it to be OK for the producer to put whatever it wants in the suboffsets field, including a junk pointer that would segfault upon dereference.
msg235152 - (view) Author: Richard Hansen (rhansen) * Date: 2015-02-01 09:23
>> How might an application break with this change?
> 
>    assert(view->suboffsets == NULL);

Fair point.
msg235153 - (view) Author: Sebastian Berg (seberg) * Date: 2015-02-01 09:29
I do not have an opinion either way, but I do think there is a small difference here compared to the other issue. With the other issue, there are cases where we cannot set the strides correctly.

If you ask numpy directly whether the array is contiguous (or create it from numpy) and it says "yes", and then you get it without asking for a contiguous array it is, without the change, possible to get an array that would *not* be considered contiguous.

On the other hand, setting suboffsets to NULL when all are -1 is always possible, if tedious I admit.
msg235155 - (view) Author: Richard Hansen (rhansen) * Date: 2015-02-01 09:39
My preference is to apply the patch, of course.  There is a legitimate concern that it will break existing code, but I think there are more points in favor of applying the patch:
  * there exists code that the current behavior is known to break
  * it's easier to remove assert(view->suboffsets == NULL) than to
    change producers to not provide an all-negative array (though I
    admit there may be other more subtle ways the patch would break
    existing code)
  * I may be the only one who has spoken up about this, but I doubt
    I'm the only one who has experienced it
msg235161 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-02-01 11:36
But only Cython does not set suboffsets to NULL and you already have a small 
patch to fix that.

The Python 3 docs say "suboffsets only if needed" and the PEP says the same,
so the situation is not completely undocumented.


I think your doc patch goes too far.  Buffers propagate in unpredictable ways, 
and since the original consumer request is *not stored* in the buffer, setting
unneeded fields to NULL provides a way to figure out the original request.


It is actually *an optimization* to set suboffsets to NULL:  Compliant code
that uses arbitrary buffers always has to check for:

    if (suboffsets != NULL && suboffsets[i] >= 0)
        ...


If suboffsets are consistently NULL, at least hopefully you get branch 
prediction to kick in.


As Sebastian pointed out, it's relatively easy even for slicing/indexing
functions to check for all-negative suboffsets and handle that case.


All this is probably academic, since no one appears to be using suboffsets
at all. :)



Let's keep the status-quo and make this a doc issue for 2.7.
msg235194 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-02-01 18:50
New changeset c4c1d68b6301 by Stefan Krah in branch '2.7':
Issue #23352: Document that Py_buffer.suboffsets must be NULL if no suboffsets
https://hg.python.org/cpython/rev/c4c1d68b6301

New changeset de5c8ee002bf by Stefan Krah in branch '3.4':
Issue #23352: Document that Py_buffer.suboffsets must be NULL if no suboffsets
https://hg.python.org/cpython/rev/de5c8ee002bf

New changeset 5d097a74766f by Stefan Krah in branch 'default':
Issue #23352: Merge from 3.4.
https://hg.python.org/cpython/rev/5d097a74766f
History
Date User Action Args
2015-02-02 17:05:41skrahsetstatus: open -> closed
resolution: fixed
stage: resolved
2015-02-01 18:50:57python-devsetnosy: + python-dev
messages: + msg235194
2015-02-01 11:36:19skrahsetpriority: normal -> low

type: behavior -> enhancement
assignee: docs@python
components: + Documentation, - Interpreter Core
title: PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL -> Document "suboffsets if needed" in 2.7
nosy: + docs@python

messages: + msg235161
stage: patch review -> (no value)
2015-02-01 09:39:42rhansensetmessages: + msg235155
2015-02-01 09:29:13sebergsetmessages: + msg235153
2015-02-01 09:23:25rhansensetmessages: + msg235152
2015-02-01 09:21:18rhansensetmessages: + msg235151
2015-02-01 09:10:04skrahsetmessages: + msg235150
2015-02-01 08:59:39skrahsetmessages: + msg235148
2015-02-01 08:45:35rhansensetmessages: + msg235146
2015-02-01 08:14:30rhansensetfiles: + doc_c-api_buffer.patch
2015-02-01 08:14:14rhansensetfiles: - doc_c-api_buffer.patch
2015-02-01 08:05:41rhansensetfiles: + doc_c-api_buffer.patch
2015-02-01 08:05:16rhansensetfiles: - doc_c-api_buffer.patch
2015-02-01 07:57:34rhansensetfiles: + doc_c-api_buffer.patch

messages: + msg235145
2015-02-01 02:29:22rhansensetmessages: + msg235141
2015-01-31 18:51:52pitrousetmessages: + msg235125
2015-01-31 17:53:23skrahsetmessages: + msg235122
2015-01-31 14:53:58skrahsetmessages: + msg235111
2015-01-31 13:03:16skrahsetmessages: + msg235107
2015-01-30 20:36:31sebergsetmessages: + msg235053
2015-01-30 20:10:03rhansensetmessages: + msg235052
2015-01-30 14:29:25skrahsetnosy: + seberg
messages: + msg235034
2015-01-30 13:10:57skrahsetmessages: + msg235032
2015-01-30 05:52:11rhansensetversions: - Python 3.2, Python 3.3
2015-01-30 05:51:37rhansensetversions: + Python 3.2, Python 3.3
2015-01-30 05:45:06rhansensetfiles: + PyBuffer_IsContiguous_v2.patch

messages: + msg235020
2015-01-30 01:42:25rhansensetmessages: + msg235017
2015-01-30 01:39:50pitrousetversions: - Python 3.2, Python 3.3, Python 3.6
nosy: + skrah, pitrou

messages: + msg235016

stage: patch review
2015-01-30 01:35:07rhansensettype: behavior
2015-01-30 01:33:45rhansencreate