It appears the odd signature for the cmp_ functions extends to more than just
the functions affected by this patch. I must have missed that in the original
patch reviews for issue 10118.
Compared to what that patch fixed, it's a pretty minor quibble :)
http://bugs.python.org/review/15573/diff/5737/Objects/memoryobject.c
File Objects/memoryobject.c (right):
http://bugs.python.org/review/15573/diff/5737/Objects/memoryobject.c#newcode259
Objects/memoryobject.c:259: cmp_format(const Py_buffer *dest, const Py_buffer
*src)
This doesn't appear to be used anywhere at the moment. Isn't it intended for a
fast-path in the comparison code, allowing an exact format match to be compared
with memcmp instead of unpacking?
If this API *is* used, the comment about being confusing applies - "same_format"
(returning 0 if different) would be much easier to understand.
http://bugs.python.org/review/15573/diff/5737/Objects/memoryobject.c#newcode276
Objects/memoryobject.c:276: cmp_shape(const Py_buffer *dest, const Py_buffer
*src)
This API is rather confusing, since it is a simple boolean equivalent check
rather than a tri-valued ordering operation.
I'd prefer "same_shape" with a return of 0 indicating that the shapes are
different.
http://bugs.python.org/review/15573/diff/5737/Objects/memoryobject.c#newcode296
Objects/memoryobject.c:296: cmp_structure(const Py_buffer *dest, const Py_buffer
*src)
This appears to be unused now. Why not just delete it?
http://bugs.python.org/review/15573/diff/5737/Objects/memoryobject.c#newcode2627
Objects/memoryobject.c:2627:
This is where I would have expected to see a fast path (based on memcmp) for
*identical* format strings. If that's not a valid optimisation, then there's no
need for cmp_format.
Thanks for the comments. The cmp* functions could be renamed, but they are all
still in use.
Filtering out non-equivalent formats in memory_ass_sub() could be made more
readable, Once that happens, cmp_format() could indeed be dropped.
I didn't touch it because I wanted to keep the diff small.
http://bugs.python.org/review/15573/diff/5737/Objects/memoryobject.c
File Objects/memoryobject.c (right):
http://bugs.python.org/review/15573/diff/5737/Objects/memoryobject.c#newcode259
Objects/memoryobject.c:259: cmp_format(const Py_buffer *dest, const Py_buffer
*src)
OK, perhaps equiv_format() is appropriate since the function
does a crude check for equivalence ('@' prefix).
http://bugs.python.org/review/15573/diff/5737/Objects/memoryobject.c#newcode276
Objects/memoryobject.c:276: cmp_shape(const Py_buffer *dest, const Py_buffer
*src)
I agree that it doesn't follow the standard pattern for cmp functions. I think
the reason I didn't call it same_shape() or eq_shape() is a curious corner case
in NumPy with zeros in shape, where two non-equal shapes can be equivalent:
>>> x = numpy.ndarray(buffer=bytearray([1,2,3,4,5,6]), shape=[2,3,0,9],
dtype='B')
>>> y = numpy.ndarray(buffer=bytearray([1,2,3,4,5,6]), shape=[2,3,0,1],
dtype='B')
>>> z = numpy.ndarray(buffer=bytearray([1,2,3,4,5,6]), shape=[2,0,1,1],
dtype='B')
>>> x.tolist()
[[[], [], []], [[], [], []]]
>>> y.tolist()
[[[], [], []], [[], [], []]]
>>> z.tolist()
[[], []]
>>> x == y
array([], shape=(2, 3, 0, 9), dtype=bool)
>>> x == z
False
How about equiv_shape() and equiv_fmt()?
http://bugs.python.org/review/15573/diff/5737/Objects/memoryobject.c#newcode296
Objects/memoryobject.c:296: cmp_structure(const Py_buffer *dest, const Py_buffer
*src)
It's still used in copy_single() and copy_buffer(). In the cascade
memory_ass_sub() -> copy_single() -> cmp_structure() both cmp_format() and
cmp_shape() are required, since the structure of the rvalue must match the
structure of the lvalue.
At the call sites of copy_buffer() cmp_format() is *currently* not required,
since the two formats are known to be identical. I left it in for safety.
http://bugs.python.org/review/15573/diff/5737/Objects/memoryobject.c#newcode2627
Objects/memoryobject.c:2627:
The NaN test case wouldn't pass memcpy(). Also, Martin mentioned unnormalized
rationals. Then, I'm not sure (but could look it up of course) if the struct
module zeros padding bytes ('x') or leaves then undefined.
http://bugs.python.org/review/15573/diff/5737/Doc/library/stdtypes.rst File Doc/library/stdtypes.rst (right): http://bugs.python.org/review/15573/diff/5737/Doc/library/stdtypes.rst#newcode2508 Doc/library/stdtypes.rst:2508: in comparisons. Proposal for rephrasing the last sentence: "While ...
http://bugs.python.org/review/15573/diff/5737/Doc/library/stdtypes.rst File Doc/library/stdtypes.rst (right): http://bugs.python.org/review/15573/diff/5737/Doc/library/stdtypes.rst#newcode2508 Doc/library/stdtypes.rst:2508: in comparisons. Yes, I don't like the original either. ...
http://bugs.python.org/review/15573/diff/5737/Doc/library/stdtypes.rst
File Doc/library/stdtypes.rst (right):
http://bugs.python.org/review/15573/diff/5737/Doc/library/stdtypes.rst#newcod...
Doc/library/stdtypes.rst:2508: in comparisons.
Yes, I don't like the original either. Also, the definition is actually
quite concrete, since it uses tolist().
Perhaps:
"""Two memoryviews v and w are equal if their shapes are equal and for
all valid indices (i, j, ...), v[i][j]... == w[i][j]... . Determining
element-wise equality requires that both v.format and w.format are in
:mod:`struct` syntax.
Therefore, for unknown format strings the result of the comparison
is always "not equal". Example ::
For the subset of format codes that is currently supported by
:meth:`tolist`, v == w is equivalent to v.tolist() == w.tolist().
Example ::"""
The drawback of this definition is that users might try out v[i][j],
which is not implemented yet. The drawback of the tolist() definition
is that it refers to an ideal tolist() method while the actual
implementation is still incomplete.
This is starting to look pretty solid to me. http://bugs.python.org/review/15573/diff/5737/Doc/library/stdtypes.rst File Doc/library/stdtypes.rst (right): http://bugs.python.org/review/15573/diff/5737/Doc/library/stdtypes.rst#newcode2508 Doc/library/stdtypes.rst:2508: ...
This is starting to look pretty solid to me.
http://bugs.python.org/review/15573/diff/5737/Doc/library/stdtypes.rst
File Doc/library/stdtypes.rst (right):
http://bugs.python.org/review/15573/diff/5737/Doc/library/stdtypes.rst#newcod...
Doc/library/stdtypes.rst:2508: in comparisons.
Proposed wording that tries to be both accurate and reasonably concise without
being misleading:
======
Two memoryviews are equal if their shapes are equivalent and if all
corresponding values are equal when the views' respective format codes are
interpreted using :mod:`struct` syntax.
For the subset of :mod:`struct` format strings currently supported by
:meth:`tolist`, two memoryviews ``v`` and ``w`` are equal if ``v.tolist() ==
w.tolist()``::
...
If either format string is not supported by the :mod:`struct` module, then the
views will always compare as unequal (even if the format strings and view
contents are identical)::
...
Note that, as with floating point numbers, ``v is w`` does *not* imply ``v ==
w`` for memoryview objects.
.. versionchanged: 3.3
In earlier versions of Python, memorview comparisons treated all views as 1D
contiguous arrays. This resulted in both false positives (such as the 32-bit
float ``1.1`` matching the 32-bit integer ``1066192077`` or ``NaN`` matching
itself) and false negatives (such as equivalent integers being considered
different if they were stored using different numbers of bytes).
======
http://bugs.python.org/review/15573/diff/5737/Doc/whatsnew/3.3.rst
File Doc/whatsnew/3.3.rst (right):
http://bugs.python.org/review/15573/diff/5737/Doc/whatsnew/3.3.rst#newcode150
Doc/whatsnew/3.3.rst:150: module syntax are supported.
I would add "Views with unrecognised format strings are still permitted, but
will always compare as unequal, regardless of view contents"
http://bugs.python.org/review/15573/diff/5737/Objects/memoryobject.c
File Objects/memoryobject.c (right):
http://bugs.python.org/review/15573/diff/5737/Objects/memoryobject.c#newcode276
Objects/memoryobject.c:276: cmp_shape(const Py_buffer *dest, const Py_buffer
*src)
On 2012/08/14 16:31:58, skrah wrote:
> How about equiv_shape() and equiv_fmt()?
+1, that would be much easier to interpret. It's mainly the "not
equivalent/equivalent -> -1/0" mapping that I found surprising, since "not
equivalent" isn't really an error as such.
http://bugs.python.org/review/15573/diff/5737/Objects/memoryobject.c#newcode296
Objects/memoryobject.c:296: cmp_structure(const Py_buffer *dest, const Py_buffer
*src)
Yeah, this was a silly comment born of searching for usage in the wrong place.
Ignore! :)
http://bugs.python.org/review/15573/diff/5737/Objects/memoryobject.c#newcode2627
Objects/memoryobject.c:2627:
A comment here explaining why a memcpy() fast path isn't a valid optimisation
would be good. I'm sure I won't be the last person to think of it without
immediately realising why it's a bad idea.
http://bugs.python.org/review/15573/diff/5737/Doc/library/stdtypes.rst File Doc/library/stdtypes.rst (right): http://bugs.python.org/review/15573/diff/5737/Doc/library/stdtypes.rst#newcode2508 Doc/library/stdtypes.rst:2508: in comparisons. Sounds good to me. In the new ...
Issue 15573: Support unknown formats in memoryview comparisons
Created 9 months, 1 week ago by skrah
Modified 9 months ago
Reviewers: Nick Coghlan, loewis
Base URL: None
Comments: 18