classification
Title: memoryview.to_bytes() and PyBuffer_ToContiguous() incorrect for non-contiguous arrays
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, benjamin.peterson, christian.heimes, georg.brandl, haypo, loewis, meador.inge, ncoghlan, pitrou, python-dev, skrah
Priority: high Keywords: patch

Created on 2011-08-24 18:45 by skrah, last changed 2015-01-31 15:17 by skrah. This issue is now closed.

Files
File name Uploaded Description Edit
issue12834.diff skrah, 2012-07-25 13:53 review
Messages (41)
msg142894 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-08-24 18:45
memoryview.tobytes() converts a non-contiguous array to
a contiguous representation. This result is not right:


>>> from numpy import *
>>> x = array([1,2,3,4,5], dtype="B")
>>> y = x[::-1]
>>> y
array([5, 4, 3, 2, 1], dtype=uint8)
>>> m = memoryview(y)
>>> m.tobytes()
'\x04\x03\x02\x01\x05'
>>>
msg142895 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-24 18:46
That's rather funky. What should the right result be?
msg142899 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-08-24 19:09
Antoine Pitrou <report@bugs.python.org> wrote:
> That's rather funky. What should the right result be?

Basically [5, 4, 3, 2, 1] as bytes:

'\x05\x04\x03\x02\x01'

Looks like an off-by-one error.

I was a bit surprised that tobytes() automatically converts anything
to a C-contiguous array. The result can be completely disconnected
from the raw memory.

[The bug also exists for forward strides.]

array([1, 3, 5], dtype=uint64)
>>> m = memoryview(y)
>>> m.tobytes()
'\x03\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00'
>>>
msg151358 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-01-16 13:09
Added 10181 as a dependency - as noted in my review comments on that issue, I think this becomes fairly trivial to fix (and test) given Stefan's other improvements.
msg151958 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-01-25 18:00
I removed the dependency since PyBuffer_ToContiguous() still needs to
be fixed in abstract.c while memoryview.tobytes() now works in the
PEP-3118 repo.
msg154241 - (view) Author: Roundup Robot (python-dev) Date: 2012-02-25 11:25
New changeset 3f9b3b6f7ff0 by Stefan Krah in branch 'default':
- Issue #10181: New memoryview implementation fixes multiple ownership
http://hg.python.org/cpython/rev/3f9b3b6f7ff0
msg165788 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-07-18 16:14
I think that I run into the same bug today. I've developing a PEP 3118 buffer interface for my wrapper of FreeImage. It returns the data as non-contiguous 3d array with the dimension height, width, color.

I've created a small test image with 5x7 RGB pixels. The first line black, second white, third grey values and the rest are red, green blue and cyan, magenta yellow in little endian (BGR order).

NumPy handles the buffer correctly:
>>> arr = numpy.asarray(img)
>>> print(arr)

[[[  0   0   0]
  [  0   0   0]
  [  0   0   0]
  [  0   0   0]
  [  0   0   0]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[ 80  80  80]
  [112 112 112]
  [160 160 160]
  [192 192 192]
  [240 240 240]]

 [[  0   0 255]
  [  0 255   0]
  [255   0   0]
  [  0   0 255]
  [  0 255   0]]

 [[255   0   0]
  [  0   0 255]
  [  0 255   0]
  [255   0   0]
  [  0   0 255]]

 [[  0 255   0]
  [255   0   0]
  [  0   0 255]
  [  0 255   0]
  [255   0   0]]

 [[255 255   0]
  [255   0 255]
  [  0 255 255]
  [255 255   0]
  [255   0 255]]]

but memoryview.tobytes() seems to have an off-by-one error:

>>> m = memoryview(img)
>>> data = m.tobytes()
>>> len(data) ==  5 * 7 * 3
True
>>> for i in range(7):
...     print(" ".join("%3i" % ord(v) for v in data[i * 5 * 3:(i + 1) * 5 * 3]))
  0   0   0   0   0   0   0   0   0   0   0   0   0   0 255
255 255 255 255 255 255 255 255 255 255 255 255 255 255  80
 80  80 112 112 112 160 160 160 192 192 192 240 240 240   0
  0 255   0 255   0 255   0   0   0   0 255   0 255   0 255
  0   0   0   0 255   0 255   0 255   0   0   0   0 255   0
255   0 255   0   0   0   0 255   0 255   0 255   0   0 255
255   0 255   0 255   0 255 255 255 255   0 255   0 255   0

As you can see the first byte is missing.

Python 2.7.3 and Python 3.2.3 with numpy 1.6.1 and https://bitbucket.org/tiran/smc.freeimage/changeset/3134ecee2984 on 64bit Ubuntu.
msg165828 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-07-19 09:47
It looks like Stefan has fixed the issue in Python 3.3 a while ago. tobytes() returns the correct values with a fresh build of Python 3.3. 

$ PYTHONPATH="." /home/heimes/dev/python/py3k/python smc/freeimage/tests/test_image.py
test_newbuffer (__main__.TestImageNewBuffer) ... 

  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
255 255 255 255 255 255 255 255 255 255 255 255 255 255 255
 80  80  80 112 112 112 160 160 160 192 192 192 240 240 240
  0   0 255   0 255   0 255   0   0   0   0 255   0 255   0
255   0   0   0   0 255   0 255   0 255   0   0   0   0 255
  0 255   0 255   0   0   0   0 255   0 255   0 255   0   0
255 255   0 255   0 255   0 255 255 255 255   0 255   0 255

However it's still broken in 3.2 (also up to date hg pull).

$ PYTHONPATH="." /home/heimes/dev/python/3.2/python smc/freeimage/tests/test_image.py
test_newbuffer (__main__.TestImageNewBuffer) ... 

  0   0   0   0   0   0   0   0   0   0   0   0   0   0 255
255 255 255 255 255 255 255 255 255 255 255 255 255 255  80
 80  80 112 112 112 160 160 160 192 192 192 240 240 240   0
  0 255   0 255   0 255   0   0   0   0 255   0 255   0 255
  0   0   0   0 255   0 255   0 255   0   0   0   0 255   0
255   0 255   0   0   0   0 255   0 255   0 255   0   0 255
255   0 255   0 255   0 255 255 255 255   0 255   0 255   0


Stefan, could you please port your fix to Python 3.2 and 3.3? Thanks!
msg165839 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-07-19 13:47
In Python 3.3 memoryobject.c is a complete rewrite. Porting the fix
separately would be quite a bit of work.

PyBuffer_ToContiguous(), which causes the problem in 2.7/3.2 and is
still broken in 3.3, could be fixed by using the recursive copy_buffer() 
function from the new memoryobject.c.


I don't know if I can fix it before the 3.3 release. When are the
next 2.7/3.2 releases?
msg165916 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-07-20 12:24
The fix would require all of these functions from memoryview.c (3.3):

last_dim_is_contiguous
cmp_structure
copy_base
copy_rec
copy_buffer


How to avoid code duplication? I could move them into abstract.c,
but conceptually they're really just low level buffer interface
functions. Also, they make a lot of assumptions (ndim >= 1,
PyBUF_FULL) that are a little dangerous for a general interface.
msg165917 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-07-20 12:25
You could move PyBuffer_ToContiguous() from abstract.c to memoryview.c.
msg165919 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-07-20 12:54
> You could move PyBuffer_ToContiguous() from abstract.c to memoryview.c.

For 3.3 that would be ideal, yes. I asked a while ago on python-dev
whether to backport the memoryview rewrite. The general mood was
against it.

So, for 2.7/3.2 I could add all these functions to abstract.c.
But an additional problem is that the whole test infrastructure of
Lib/test/test_buffer.py and Modules/_testbuffer.c would be missing.
msg166012 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-07-21 11:41
There is an additional problem with PyBuffer_ToContiguous():

Suppose 'view' is multi-dimensional, C-contiguous and initialized
according to PyBUF_ND, i.e. view->shape != NULL but view->strides == NULL.

Now if PyBuffer_ToContiguous() is called with 'F', PyBuffer_IsContiguous()
returns false and view->strides will be accessed.


This means that incomplete buffer information will have to be
reconstructed like it is done in the 3.3 memoryview.
msg166400 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-07-25 13:53
Here's a patch for 3.3, which consists mostly of tests. A couple of points:

  o I removed the len > view->len check in PyBuffer_ToContiguous(), since
    the function is not documented and silently accepting output buffers
    that are too large seems unusual to me.

  o Removed the comment in bytesobject.c "Better way to get to internal buffer?".
    IMO the answer is no: There isn't any better way that works in full generality.

  o Removed the "need to check for overflow" comment: ndim is now officially
    limited to 64.


I think this can go into 3.3: It's easy to see that for contiguous buffers
PyBuffer_ToContiguous() behaves in the same manner as before (except for
the len issue).

For memoryview, buffer_to_contiguous(x, y 'C') is literally the same
code as buffer_to_c_contiguous().
msg166550 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-07-27 09:34
Any objections to committing this before beta2? What about the
len > view->len change: Does that look reasonable?
msg166562 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-27 13:20
Georg, need a call on how close you are to cutting beta 2 and whether you want this to wait until rc1.
msg166564 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-27 13:28
Summary of my review for Georg's benefit: I had one minor formatting nit with the patch (which Stefan can easily fix before committing), but it otherwise looked fine to me.

I also agree that the old implicit truncation was unusable in practice, as the *actual* length is not returned from the function, and in fact cannot be returned because the return type is int rather than Py_ssize_t. A length mismatch therefore needs to be an error.
msg166638 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-07-28 08:50
Thanks, Nick! I'll move the function declaration back to abstract.h.

Waiting for Georg's input. -- It seems to me that #14330 is a blocker
that will only be fixed on Monday.
msg166639 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-07-28 09:15
With the beta delayed as you say, I'm okay with this going in now.
msg166644 - (view) Author: Roundup Robot (python-dev) Date: 2012-07-28 10:28
New changeset 8fbbc7c8748e by Stefan Krah in branch 'default':
Issue #12834: Fix PyBuffer_ToContiguous() for non-contiguous arrays.
http://hg.python.org/cpython/rev/8fbbc7c8748e
msg166645 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-07-28 10:38
All right, 3.3 is fixed. Re-targeting for 3.2 and 2.7.
msg166712 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-29 04:17
Was the point that memoryview.tobytes() has a known data corruption bug in 3.2 and 2.7 raised in the previous discussion? I'm pretty sure I had forgotten about it, and I don't remember it coming up in the thread.

The trickiest aspect of a backport of the new implementation is that we need to preserve the C ABI - extensions compiled against any maintenance release should work with all maintenance releases in that series.

The new APIs aren't a major problem - just sprinkle a few underscores around to mark them as private on the older versions (I've certainly done that before when a bug fix genuinely needed something that qualified as a new feature: implemented a private version to use in fixing the bug on the maintenance branch, then promote that to a public API on trunk)
msg166728 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-07-29 08:59
Christian's posts and my initial report were about memoryview.tobytes(). It's
good that you changed the title: memoryview.tobytes() is more meaningful than
the slightly obscure PyBuffer_ToContiguous().


BTW, I'm sure that PyBuffer_FromContiguous() and PyObject_CopyData() have the
same problem, but they aren't used in the Python source tree and they are
undocumented.
msg166754 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-07-29 12:49
Thanks Stefan and Nick!

I tried to find the off-by-one bug myself but gave up quickly. Stefan's rewrite is a better approach.
msg166764 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-07-29 14:25
Nick, are you talking about a complete backport or just about pulling
in all functions needed by PyBuffer_ToContiguous()?
msg166840 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-30 00:38
I suspect the need to preserve the C ABI would make a complete backport a challenge. I'm still tempted though, mainly to give third parties a robust core implementation of the buffer API to test against.

This is especially so with Python 2.7 still having a couple of years of full maintenance left - that's a long time to leave it with a known broken memoryview implementation. I'm less worried about 3.2, since the upgrade path to 3.3 is easier in that case, but even that version is likely to see widespread use for a long time.
msg166842 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-30 01:02
> This is especially so with Python 2.7 still having a couple of years
> of full maintenance left - that's a long time to leave it with a known
> broken memoryview implementation. I'm less worried about 3.2, since
> the upgrade path to 3.3 is easier in that case, but even that version
> is likely to see widespread use for a long time.

Well, there's a reason we don't backport features to bugfix branches,
especially when we're talking about a complete rewrite of the
implementation.  So I really don't agree this should be backported.
msg166891 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-07-30 12:25
Right now major parts of the buffer API are broken for non-trivial buffer definitions. IMHO a backport of the fix doesn't count as a new feature although it needs some new internal functions.

I don't quite understand why Nick thinks that ABI compatibility is a challenge. The structs, typedefs and function definitions aren't modified. The new functions aren't visible because they can be implemented as static functions if PyBuffer_ToContiguous() is moved to memoryview.c. That won't break the ABI eiter.

If we want to keep the function in its old place then we can prefix the new functions with _Py and include them in a private header file. That would export new function.
msg166894 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-30 13:28
I was musing about backporting *all* the memoryview fixes, including the buffer lifecycle ones. Antoine and Stefan rightly pointed out that was a bad idea, and not necessary to address this problem.

So +1 for backporting just this specific fix, with the necessary infrastructure to make it work.
msg166917 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-07-30 18:13
> Right now major parts of the buffer API are broken for non-trivial
> buffer definitions. IMHO a backport of the fix doesn't count as a new
> feature although it needs some new internal functions.

This particular bug fix for PyBuffer_ToContiguous() (which would automatically
fix memoryview.tobytes() in 2.7 and 3.2) is easy to backport.

For non-trivial buffers it's likely though that other problems
will show up in versions 2.7 and 3.2.


Backporting the *complete* rewrite would be relatively easy, too.
But that would change the layout of the MemoryView object etc.

Fixing all of #10181 without the drastic changes would be very time
consuming (if at all possible).
msg167761 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-09 06:24
Removing 3.1, since its addition was apparently done by mistake.

Again, I wonder why this is marked release-critical. It's not a regression from previous versions, is it? Please use release-critical only if not making a release at a certain point in the future is better than making the release (despite all the advantages that the release otherwise might have). If you merely think that the issue is "really important" and "should not be forgotten", use "critical" or "high".
msg167762 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-09 06:35
Since the next release of 3.2 is the *last* release of 3.2, yet it will remain supported by distros well beyond that, yes, I think this should block the final 3.2 release.
msg167764 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-09 06:59
> Since the next release of 3.2 is the *last* release of 3.2, yet it
> will remain supported by distros well beyond that, yes, I think this
> should block the final 3.2 release.

But the same will be true for any other bug that 3.2 has. If they
don't get fixed now, they will remain unfixed. Should we therefore
delay the last release of 3.2 until all bugs in it are fixed?

The consequence instead should be that people experiencing this
bug will have to move to Python 3.3 to get it fixed. Since it only
affects NumPy users (AFAICT), and then only those who use tobytes,
I wouldn't consider this bug even critical.
msg167765 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-09 07:44
> Removing 3.1, since its addition was apparently done by mistake.

I'm unable to set 2.7 and 3.2 in my browser without also setting
3.1 (using the Shift key to mark multiple fields).
msg169125 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-25 10:23
The last bugfix release of 3.2 will happen "shortly" after the release of 3.3, so in a not-too-far future (compared to the age of this issue, which just had its first birthday yesterday).

So if this issue should really block 3.2 (which I still think it should not), I'd urge possible contributors to start working on it now - especially if this is going to be a large patch. If such patch introduces new bugs, it won't be possible to ever fix them (unless they are security-critical). Also note that this is almost the only release blocker for the 3.2 release, next to a easy-to-implement request to update the expat code.
msg169126 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-25 10:29
Despite my earlier comments, I'm now inclined to agree with Martin here - upgrading to 3.3 fixes so many other problems with memoryview, that's a more compelling solution.

And, of course, using NumPy instead always remains an option for more robust buffer API support in older versions.
msg169127 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-25 10:30
(Not saying this shouldn't be fixed, just saying it's not a disaster if it isn't)
msg169128 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-25 10:38
I agree that for 3.2 this isn't so important given that non-contiguous
arrays have multiple issues there.

Christian, does a fix for 3.2 benefit FreeImage? Don't you run into
other problems with memoryview?


If it helps, I can try to write a patch for 3.2.
msg229295 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-10-14 14:53
2.7 is the only remaining candidate for the fix. I'm not going to
work on it: somehow seems too risky for 2.7 at this stage.
msg235113 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-01-31 15:16
Fixed for 2.7 in #22668.
msg235114 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-01-31 15:17
Sorry, #23349.
History
Date User Action Args
2015-01-31 15:17:31skrahsetmessages: + msg235114
2015-01-31 15:16:36skrahsetmessages: + msg235113
2014-10-14 15:02:51pitrousetstatus: pending -> closed
2014-10-14 14:53:24skrahsetstatus: open -> pending
versions: - Python 3.2
messages: + msg229295

assignee: skrah ->
resolution: fixed
stage: needs patch -> resolved
2012-08-25 10:38:16skrahsetmessages: + msg169128
2012-08-25 10:30:04ncoghlansetmessages: + msg169127
2012-08-25 10:29:14ncoghlansetpriority: release blocker -> high

messages: + msg169126
2012-08-25 10:23:18loewissetmessages: + msg169125
2012-08-21 03:08:24belopolskysetnosy: + belopolsky
2012-08-09 07:44:05skrahsetmessages: + msg167765
2012-08-09 06:59:17loewissetmessages: + msg167764
2012-08-09 06:35:54ncoghlansetmessages: + msg167762
2012-08-09 06:24:36loewissetnosy: + loewis

messages: + msg167761
versions: - Python 3.1
2012-07-30 18:13:11skrahsetmessages: + msg166917
2012-07-30 13:28:29ncoghlansetmessages: + msg166894
2012-07-30 12:25:09christian.heimessetmessages: + msg166891
2012-07-30 01:02:56pitrousetmessages: + msg166842
2012-07-30 00:38:23ncoghlansetpriority: high -> release blocker

messages: + msg166840
2012-07-29 14:25:45skrahsetmessages: + msg166764
2012-07-29 12:49:07christian.heimessetmessages: + msg166754
2012-07-29 09:59:23ncoghlansettitle: memorview.to_bytes() and PyBuffer_ToContiguous() incorrect for non-contiguous arrays -> memoryview.to_bytes() and PyBuffer_ToContiguous() incorrect for non-contiguous arrays
2012-07-29 08:59:17skrahsetmessages: + msg166728
2012-07-29 04:17:20ncoghlansetpriority: normal -> high

messages: + msg166712
title: PyBuffer_ToContiguous() incorrect for non-contiguous arrays -> memorview.to_bytes() and PyBuffer_ToContiguous() incorrect for non-contiguous arrays
2012-07-28 10:38:30skrahsetpriority: release blocker -> normal

messages: + msg166645
versions: + Python 3.1, - Python 3.3
2012-07-28 10:28:09python-devsetmessages: + msg166644
2012-07-28 09:15:26georg.brandlsetmessages: + msg166639
2012-07-28 08:50:19skrahsetmessages: + msg166638
2012-07-27 13:28:16ncoghlansetmessages: + msg166564
2012-07-27 13:20:32ncoghlansetpriority: normal -> release blocker
nosy: + benjamin.peterson, georg.brandl
messages: + msg166562

2012-07-27 09:34:08skrahsetmessages: + msg166550
2012-07-25 13:53:51skrahsetfiles: + issue12834.diff
keywords: + patch
messages: + msg166400
2012-07-21 11:41:15skrahsetmessages: + msg166012
2012-07-20 12:54:24skrahsetmessages: + msg165919
2012-07-20 12:25:50christian.heimessetmessages: + msg165917
2012-07-20 12:24:03skrahsetmessages: + msg165916
2012-07-19 13:47:04skrahsetmessages: + msg165839
2012-07-19 09:47:57christian.heimessetmessages: + msg165828
versions: + Python 3.2
2012-07-18 16:15:28meador.ingesetnosy: + meador.inge
2012-07-18 16:14:07christian.heimessetnosy: + christian.heimes

messages: + msg165788
versions: + Python 2.7
2012-02-25 11:25:30python-devsetnosy: + python-dev
messages: + msg154241
2012-01-25 18:00:18skrahsetdependencies: - Problems with Py_buffer management in memoryobject.c (and elsewhere?)
messages: + msg151958
title: memoryview.tobytes() incorrect for non-contiguous arrays -> PyBuffer_ToContiguous() incorrect for non-contiguous arrays
2012-01-16 13:09:27ncoghlansetnosy: + ncoghlan
dependencies: + Problems with Py_buffer management in memoryobject.c (and elsewhere?)
messages: + msg151358
2011-08-24 19:16:29hayposetnosy: + haypo
2011-08-24 19:09:57skrahsetmessages: + msg142899
2011-08-24 18:46:53pitrousetnosy: + pitrou
messages: + msg142895
2011-08-24 18:45:04skrahcreate