New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memoryviews require more strict contiguous checks then necessary #66635
Comments
In NumPy we decided some time ago that if you have a multi dimensional buffer, shaped for example 1x10, then this buffer should be considered both C- and F-contiguous. Currently, some buffers which can be used validly in a contiguous fashion are rejected. CPython does not support this currently possibly creating smaller nuisance if/once we change it fully in NumPy, see for example numpy/numpy#5085 I have attached a patch which should (sorry I did not test this at all yet) relax the checks as much as possible. I think this is right, but we did some subtle breaks in user code (mostly cython code) when we first tried changing it, and while numpy arrays may be more prominently C/F-contiguous, compatibility issues with libraries checking for contiguity explicitly and then requesting a strided buffer are very possible. If someone could give me a hint about adding tests, that would be great. |
There is another oddity: bpo-12845. Does NumPy have a formal definition of |
BTW, if you have NumPy installed and run test_buffer in Python3.3+, |
bpo-12845 should be closed, seems like a bug in some old version. The definition now is simply that the array is contiguous if you can legally access it in a contiguous fashion. Which means first stride is itemsize, second is itemsize*shape[0] for Fortran, inverted for C-order. |
To be clear, the important part here, is that to me all elements can be accessed using that scheme. It is not correct to assume that In other words, you have to be able to pass the pointer to the start of a c-contiguous array into some C-library that knows nothing about strides without any further thinking. The 0-strides badly violate that. |
Thanks, bpo-12845 is indeed fixed in NumPy. Why does NumPy consider an array with a stride that will almost In CPython we try to eliminate these kinds of issues (though >>> import numpy as np
import io x = np.arange(10)
y = np.array([x])
print(y.strides)
(9223372036854775807, 8)
>>>
>>>
>>> y.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False |
Well, the 9223372036854775807 is certainly no good for production code and we would never have it in a release version, it is just there currently to expose if there are more problems. However I don't care what happens on overflow (as long as it is not an error). Note that the stride here is on a dimension with shape 1. The only valid index is thus always 0 and 0*9223372036854775807=0, so the stride value does not actually matter when calculating offsets into the array. You could simply set it to 80 to get something that would be considered C-contiguous or to 8 to get something that is considered F-contiguous. But both is the case in a way, so just "cleaning up" the strides does not actually get you all the way. |
Ok, so it is a debug thing in the current NumPy sources. IMO ultimately the getbufferproc needs to return valid strides, even For that matter, the getbufferproc is free to translate the multi- Does it matter if you lose some (irrelevant?) information about |
An extra dimension is certainly not irrelevant! The strides *are* valid So there are three options I currently see:
This leads to this maybe: |
I think it would help discussing your options if the patch passes test_buffer Then, it would help enormously if you give Python function definitions of I mean something like verify_structure() from Lib/test/test_buffer.py -- that |
I am very sorry. The attached patch fixes this (not sure if quite right, but if anything should be more general then necessary). One test fails, but it looks like exactly the intended change. |
Thanks! I still have to review the patch in depth, but generally Curiously enough the existing code already considered e.g. shape=[1], strides=[-5] as contiguous. |
Since the functions in abstract.c have been committed by Travis Oliphant: Could there have been a reason why the {shape=[1], strides=[-5]} Or is it generally accepted among the numpy devs that not considering |
Yeah, the code does much the same as the old numpy code (at least most of the same funny little things, though I seem to remember the old numpy code had something yet a bit weirder, would have to check). To be honest, I do not know. It isn't implausible that the original numpy code dates back 15 years or more to numeric. I doubt whoever originally wrote it thought much about it, but there may be some good reason, and there is the safety considerations that people use the strides in a way they should not, which may trip us here in any case. |
Ok, here's my take on the situation:
Regarding your option 2b): I think it may be confusing, the buffer protocol So, I think purity wins here. If you are sure that all future NumPy versions I've moved the checks for 0 in shape[i] to the beginning (len == 0). I hope |
FWIW, I think it would be good to make this change early in the Regarding the discussion here ... ... about the special stride marker: In the case of slicing it would be nice to use the "organic" value |
Numpy 1.9. was only released recently, so 1.10. might be a while. If no I will try to look at it more soon, but am completly overloaded at least |
Okay, the whole thing isn't that urgent either. Sorry for the confusion w.r.t slicing: I misremembered what the a) >>> x = np.array([[1,2,3,]])
>>> x.strides
(9223372036854775807, 8)
b)
>>> x = np.array([[1,2,3], [4,5,6]])[:1]
>>> x.strides
(24, 8) Somehow I thought that case b) would also produce the special marker, |
Is this related to the NPY_RELAXED_STRIDES_CHECKING compilation flag? |
@pitrou, yes of course. This would make python do the same thing as numpy does (currently only with that compile flag given). |
Like Stefan I think this would be good to go in 3.5. The PyBuffer APIs are relatively new so there shouldn't be a lot of breakage. |
Antoine, sounds good to me, I don't mind this being in python rather sooner then later, for NumPy itself it does not matter I think. I just wanted to warn that there were problems when we first tried to switch in NumPy, which, if I remember correctly, is now maybe 2 years ago (in a dev version), though. |
New changeset 369300948f3f by Stefan Krah in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: