This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: memoryview: equality-hash invariant
Type: behavior Stage: resolved
Components: Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: skrah Nosy List: Alexander.Belopolsky, Arfrever, BreamoreBoy, Dag.Sverre.Seljebotn, belopolsky, christian.heimes, georg.brandl, loewis, mark.dickinson, meador.inge, ncoghlan, pitrou, python-dev, skrah, vstinner
Priority: normal Keywords: patch

Created on 2012-08-29 17:07 by skrah, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue15814-doc.diff skrah, 2012-08-30 11:36 review
issue15814-2.diff skrah, 2012-09-01 11:22 review
Messages (51)
msg169398 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-29 17:07
The new PEP-3118 equality definition from #15573 that is based
on element-wise comparisons breaks the equality-hash invariant:

>>> from _testbuffer import ndarray
>>> x = ndarray([1,2,3], shape=[3], format='f')
>>> y = ndarray([1,2,3], shape=[3], format='B')
>>> a = memoryview(x)
>>> b = memoryview(y)
>>> a == b
True
>>> hash(a) == hash(b)
False
>>>
msg169403 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-29 17:54
I think the proper solution is to make memoryview objects unhashable. Any other approach will have flaws of some kind.
msg169404 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-29 18:01
> I think the proper solution is to make memoryview objects unhashable.

Disagreed. If memoryviews are to be bytes-like objects they should be
hashable (at least when readonly).

> Any other approach will have flaws of some kind.

Not more so than equality between memoryviews.
msg169408 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-29 18:59
Am 29.08.12 20:01, schrieb Antoine Pitrou:
>> I think the proper solution is to make memoryview objects unhashable.
>
> Disagreed. If memoryviews are to be bytes-like objects they should be
> hashable (at least when readonly).

So what specific hash algorithm do you propose?

>> Any other approach will have flaws of some kind.
>
> Not more so than equality between memoryviews.

Well, memoryviews now have a definition of equality as discussed in 
issue15573. You may disagree whether it's a useful definition, but I
don't believe it has actual flaws (in the sense that it is incorrect
or inconsistent).

My claim is that any hash definition for memoryviews will have a
*fundamental* flaw, failing to provide the basic property
that A==B must imply hash(A)==hash(B), making it actually work
incorrectly (when used for its primary application, namely keys
in dictionaries).
msg169409 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-29 19:06
> Am 29.08.12 20:01, schrieb Antoine Pitrou:
> >> I think the proper solution is to make memoryview objects unhashable.
> >
> > Disagreed. If memoryviews are to be bytes-like objects they should be
> > hashable (at least when readonly).
> 
> So what specific hash algorithm do you propose?

The current algorithm works well in conjunction with bytes objects.

> My claim is that any hash definition for memoryviews will have a
> *fundamental* flaw, failing to provide the basic property
> that A==B must imply hash(A)==hash(B), making it actually work
> incorrectly

Why is there such a fundamental flaw?
msg169411 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-29 19:33
Am 29.08.12 21:06, schrieb Antoine Pitrou:
>> So what specific hash algorithm do you propose?
>
> The current algorithm works well in conjunction with bytes objects.

That's about the only type if works for.

>> My claim is that any hash definition for memoryviews will have a
>> *fundamental* flaw, failing to provide the basic property
>> that A==B must imply hash(A)==hash(B), making it actually work
>> incorrectly
>
> Why is there such a fundamental flaw?

The current algorithm is flawed as described in Stefan's original
message: two objects compare equal, yet hash different. That means
that if you use them as dictionary keys, you may end up with two
different values for the "same" key, depending on the size of the
dictionary (as the modulo operation in the dictionary still may
map the different hashes to the same dictionary slot).

In general, since memoryview(obj)==obj, it would be necessary that
hash(memoryview(obj))==hash(obj). However, since memoryview cannot
know what hashing algorithm obj uses, it cannot compute the hash
value with the same algorithm.

IOW, hashing is mutually exclusive with comparison with the
underlying object, unless you know what the hash algorithm
of the underlying object is.

So restricting tp_hash to memoryview objects where the underlying
object is the bytes type would work.
msg169415 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-29 20:04
Martin v. L??wis <report@bugs.python.org> wrote:
> In general, since memoryview(obj)==obj, it would be necessary that
> hash(memoryview(obj))==hash(obj). However, since memoryview cannot
> know what hashing algorithm obj uses, it cannot compute the hash
> value with the same algorithm.

In the memoryview-hash thread on python-dev [1] this objection was
addressed by demanding from exporters that they all use:

   hash(x) == hash(x.tobytes())

Since the previous equality concept was also based on
x.tobytes() == y.tobytes(), this wasn't a problem.

The new equality definition and any possible new hash definition should
probably also be part of the buffer API documentation, since they
aren't memoryview specific.

[1] http://mail.python.org/pipermail/python-dev/2011-November/114459.html
msg169416 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-29 20:27
And since memory_richcompare() and any potentially compatible hash function
are quite tricky to implement by now, perhaps the generally useful parts
could be exposed as PyBuffer_RichCompareBool() and PyBuffer_Hash().
msg169417 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-29 20:50
Am 29.08.12 22:04, schrieb Stefan Krah:
> In the memoryview-hash thread on python-dev [1] this objection was
> addressed by demanding from exporters that they all use:
>
>     hash(x) == hash(x.tobytes())
>
> Since the previous equality concept was also based on
> x.tobytes() == y.tobytes(), this wasn't a problem.

In the light of this requirement, it's even more difficult to ask
people that they change their hashing, since some exporters may already
comply with that original request.

> The new equality definition and any possible new hash definition should
> probably also be part of the buffer API documentation, since they
> aren't memoryview specific.

That's not true: they *are* memoryview-specific. The notion of equality
is entirely one of memoryview objects, not of buffers.

I still maintain that specifying hashing for memoryviews under the
new equality definition is just not feasible, and that we should give
up on it (except perhaps supporting the hashing of bytes views).
I also question whether it is useful to hash arbitrarily-shaped
read-only buffers (along with questioning whether people will actually
*have* arbitrarily-shaped read-only buffers).

Stefan: please do propose a semantics also along with proposing interfaces.
msg169425 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-30 00:26
My perspective is that hashing a memoryview only makes sense when the memoryview is read-only and "m == m.tobytes()" (i.e. it's a C contiguous 1D view of bytes, either because that's what the original object exported as a buffer or because the view has been cast that way)

So, I think we should document that restriction in What's New and the memoryview docs for 3.3.0, and actually enforce it in 3.3.1 by raising ValueError from hash(m) when the assumption of a 1D C contiguous bytes view is violated (the "read-only" restriction is already enforced).

In previous versions, this constraint didn't need to be explicitly enforced, since memoryview basically treated *everything* as a 1D C contiguous view of bytes.
msg169440 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-30 09:24
Martin v. L??wis <report@bugs.python.org> wrote:
> >     hash(x) == hash(x.tobytes())
> In the light of this requirement, it's even more difficult to ask
> people that they change their hashing, since some exporters may already
> comply with that original request.

I don't think so. memoryview.__hash__() is new in 3.3 and the requirement
is not documented at all in the general PEP-3118 sections.

[Adding Stefan Behnel to nosy, since Cython is pretty quick to pick up
new features.]

> > The new equality definition and any possible new hash definition should
> > probably also be part of the buffer API documentation, since they
> > aren't memoryview specific.
> 
> That's not true: they *are* memoryview-specific. The notion of equality
> is entirely one of memoryview objects, not of buffers.

Could you name a part of the equality definition that is memoryview-specific?

> I still maintain that specifying hashing for memoryviews under the
> new equality definition is just not feasible, and that we should give
> up on it (except perhaps supporting the hashing of bytes views).
> I also question whether it is useful to hash arbitrarily-shaped
> read-only buffers (along with questioning whether people will actually
> *have* arbitrarily-shaped read-only buffers).

Useful, perhaps. I don't know if it is worth the effort though. We
could restrict hashing to contiguous bytes views in 3.3.1.
msg169450 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-30 11:35
Here's a doc patch restricting the hash to formats 'B', 'b' and 'c'.
I think non-contiguous views are fine: both __eq__() and tobytes()
handle these, so the equality-hash invariant is preserved.
msg169451 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-30 13:01
Am 30.08.12 11:24, schrieb Stefan Krah:
>>> The new equality definition and any possible new hash definition should
>>> probably also be part of the buffer API documentation, since they
>>> aren't memoryview specific.
>>
>> That's not true: they *are* memoryview-specific. The notion of equality
>> is entirely one of memoryview objects, not of buffers.
>
> Could you name a part of the equality definition that is memoryview-specific?

As literally spelled out, it starts with "a memoryview and a PEP 3118 
exporter". But I was actually thinking of the implementation...

In any case, I'm skeptical that this is a useful extension to the buffer 
interface. You would imply "two exporters are equal iff their exported 
buffers are equal". I think this is asking too much.
msg169497 - (view) Author: Dag Sverre Seljebotn (Dag.Sverre.Seljebotn) Date: 2012-08-31 03:31
It is perfectly possible for an object to export memory in a read-only way that may still change. Another object may have a writeable view:

x = obj.readonly_view
y = obj.writable_view
obj.move_to_next_image() # changes memory under x, y

So, hashing using tobytes() doesn't make any sense at all to me. A memoryview != its contents.

You could compare memoryviews simply by comparing the Py_buffer structure, but that's going to be confusing for a lot of users. I would really prefer unhashable (+ if needed a method for comparing contents).

(FWIW, not sure how relevant this is; in NumPy, == does 

In [1]: np.array([1,2,3]) == np.array([1,3,3])
Out[1]: array([ True, False,  True], dtype=bool)

Cython will follow this behaviour for its "typed memory views", which is Cython's builting mechanism for PEP 3118 which is not quite the same as CPython "memoryview". But I understand that following this behaviour is probably out of the question for CPython.)
msg169498 - (view) Author: Dag Sverre Seljebotn (Dag.Sverre.Seljebotn) Date: 2012-08-31 03:47
OK, I can understand the desire to make memoryviews be bytes-like objects (though to my mind, bytes is "frozen" in a very different way...)

If so, and it is deemed worth it, my suggestion is to add a new PyBUF_CONST flag to the buffer acquisition in that case (which can not be used together with PyBUF_WRITABLE). Given that flag, the exporter guarantees that the contents does not change (or fails to give away a buffer).

Perhaps it could be possible for hash() to try to re-acquire a const buffer, so that some buffers are hashable (by contents) and others not.

(I really think relying on buffers that are not writeable to be constant is dangerous. A readonly memoryview could for instance point straight to the live output of a webcam.)
msg169499 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2012-08-31 04:14
To add on Dag's comments, this essentially means that any caching of the hash value is dangerous, unless it can be assured that the underlying buffer definitely has not changed in the meantime. There is no way for users to explicitly tell a memoryview to rehash itself, so putting it into one dict, then modifying the content, then putting it into another is a perfectly reasonable use case for users, but fails when the memory view caches its hash value.

As far as I can see it, any "obvious" fix for this creates a whole bath tub full of worms: updating views transitively, adding new APIs, supporting new buffer parameters, ...
msg169508 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-31 08:26
Dag Sverre Seljebotn <report@bugs.python.org> wrote:
> It is perfectly possible for an object to export memory in a read-only
> way that may still change. Another object may have a writeable view:
> 
> x = obj.readonly_view
> y = obj.writable_view
> obj.move_to_next_image() # changes memory under x, y

PEP-3118 always means different things to different people. :)

Your distinction between constant, readonly, writable is one way to
interpret things. But looking at the use in the Python source tree,
readonly does mean immutable.

Unfortunately the PEP doesn't specify readonly: Is it the same as
"mount a partition readonly", "readonly CD" or O_RDONLY?

Also, existing practice was that in response to a PyBUF_FULL_RO request
the exporter was free to hand out a writable buffer!

This is now documented and subverts reliable tagging for consumers:

http://docs.python.org/dev/c-api/buffer.html#readonly-format

If any of this is a major inconvenience to NumPy or other users of the
buffer API, perhaps you could open a separate issue.

> (FWIW, not sure how relevant this is; in NumPy, == does 
> 
> In [1]: np.array([1,2,3]) == np.array([1,3,3])
> Out[1]: array([ True, False,  True], dtype=bool)

Since afa3dedfee18 we compare memoryviews and any exporter in the same manner
as NumPy, except that the return value is a single boolean. So I think the new
comparison is equivalent to all(a == b) in NumPy.
msg169510 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-31 08:42
Dag Sverre Seljebotn <report@bugs.python.org> wrote:
> OK, I can understand the desire to make memoryviews be bytes-like objects
> (though to my mind, bytes is "frozen" in a very different way...)

We have two desires that sometimes conflict: People who want to use
memoryview as a *buffer* legitimately want it to behave like bytes.

People who want to have an "arrayview" want it to behave like NumPy
or PIL arrays. The new implementation preserves fast paths for bytes
and otherwise behaves pretty much like a very restricted version of
NumPy's ndarray (with support for suboffsets), except that not all
planned multi-dimensional features have been implemented yet.

> If so, and it is deemed worth it, my suggestion is to add a new
> PyBUF_CONST flag to the buffer acquisition in that case

Note that memory_hash() raises an error if the exporter *itself* is
not hashable, so it only hashes immutable objects by design.

But the PyBUF_CONST flag might be worth considering (in another issue).
msg169515 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-31 10:01
Am 31.08.12 06:14, schrieb Stefan Behnel:
> To add on Dag's comments, this essentially means that any caching of
> the hash value is dangerous

Stefan Krah is right here: the proper test (which is already 
implemented) is whether the underlying object is hashable; this
is a sufficient test.

Caching of the hash is always correct, or else the hash function
is incorrect: a hash of an object must not change. If it would
change, you couldn't find the key anymore in a dictionary.

> unless it can be assured that the
> underlying buffer definitely has not changed in the meantime. There
> is no way for users to explicitly tell a memoryview to rehash itself,
> so putting it into one dict, then modifying the content, then putting
> it into another is a perfectly reasonable use case for users,

No, it's not:

d = {}
o = some_object
m = memoryview(o)
d[m] = -1
o.modify()  # so that hash(m) changes
print(d[m]) # keyerror, even though m in d.keys()

> As far as I can see it, any "obvious" fix for this creates a whole
> bath tub full of worms: updating views transitively, adding new APIs,
> supporting new buffer parameters, ...

So above (in this thread): the easiest fix is to just drop hashing
support for memoryview objects. This can be extended in steps:
- support hashing if the underlying object is bytes
- support hashing if the shape is 1D "B", and the underlying object
   is hashable
- support hashing if the exporter has somehow promised to not modify
   the memory
msg169622 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-01 11:22
Here's a patch that enforces byte formats.


Does everyone agree on (or tolerate at least) allowing 'B', 'b' and 'c'?
I think we should at least commit the doc patch for 3.3.0. Otherwise
implementors of exporting objects might waste time on a feature that's
deprecated.
msg169626 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-09-01 12:13
+1 for docs patch for 3.3.0 and then enforcing the format restriction for 3.3.1.
msg169627 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-09-01 12:40
New changeset 895e123d9476 by Stefan Krah in branch 'default':
Issue #15814: Document planned restrictions for memoryview hashes in 3.3.1.
http://hg.python.org/cpython/rev/895e123d9476
msg169633 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-09-01 13:52
Am 01.09.12 13:22, schrieb Stefan Krah:
> Does everyone agree on (or tolerate at least) allowing 'B', 'b' and 'c'?

Why be more permissive than necessary? -0 on the committed version;
it should IMO further restrict it to 1D contiguous byte arrays.
msg169636 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-01 14:24
Martin v. L??wis <report@bugs.python.org> wrote:
> Why be more permissive than necessary? -0 on the committed version;
> it should IMO further restrict it to 1D contiguous byte arrays.

Does "byte arrays" include 'b' and 'c' or just 'B'? I don't see a reason 
to allow 'B' but not the others.


I'm +-0 on allowing multi-dimensional arrays, but it would be
odd to restrict hashing to contiguous arrays:

>>> b = b'abcdefhhijkl'
>>> m = memoryview(b)
>>> b[::-1]
b'lkjihhfedcba'
>>> bytes(m[::-1]) 
b'lkjihhfedcba'
>>> hash(b[::-1]) == hash(m[::-1])
True


My reasoning was: If non-contiguous arrays are allowed (and I think they
should be), why not allow multi-dimensional arrays, too?


The definition hash(m) == hash(m.tobytes()) is pretty straightforward.
msg169638 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2012-09-01 14:42
On Sep 1, 2012, at 10:24 AM, Stefan Krah <report@bugs.python.org> wrote:

> 
> The definition hash(m) == hash(m.tobytes()) is pretty straightforward.

I probably missed something from the early discussion, but doesn't this definition only work for 1d (or 0d) views? Shouldn't shape come into calculation for multidimensional views?
msg169641 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-01 15:06
tobytes() is the same as the flattened multi-dimensional list representation
with all elements converted to bytes. If I'm not mistaken, that's how NumPy's
tostring() behaves.
msg169642 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2012-09-01 15:14
On Sep 1, 2012, at 11:06 AM, Stefan Krah <report@bugs.python.org> wrote:

> tobytes() is the same as the flattened multi-dimensional list representation
> with all elements converted to bytes.

This is correct, but why is it desirable to have deliberate hash collisions between views with different shapes?
msg169643 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-09-01 15:16
Keep in mind that its OK if hash(m) == hash(m.tobytes()) in some cases where "m != m.tobytes()". The only cases we really need to kill are those that break the hash invariant.

I don't like the idea of making the definition of hash more complicated just to rule out cases that aren't actually broken.
msg169644 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-01 15:29
> why is it desirable to have deliberate hash collisions between views with different shapes?

Since we're now restricting everything to bytes, the multi-dimensional case
is probably not useful at all. As I said above, I would leave it in because
it actually saves one check for ndim and as Nick said, the definition is
simpler. Now the user only has to remember that hashing works for 'B', 'b'
and 'c'.
msg169645 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-09-01 16:06
Am 01.09.12 16:24, schrieb Stefan Krah:
> Does "byte arrays" include 'b' and 'c' or just 'B'? I don't see a reason
> to allow 'B' but not the others.

Either type is fine with me. It's the multi-dimensional aspect I'd like
to ban.

> My reasoning was: If non-contiguous arrays are allowed (and I think they
> should be), why not allow multi-dimensional arrays, too?

I think neither should be allowed. The explicit request (from Antoine)
was that memoryviews of bytes objects should be hashable. I haven't
heard anybody asking for something more general (but I could see that
people want to hash other memory blocks as well, such as mmap views).

I can see that it is tempting to provide the most general definition,
but see where this approach got us. It's always easier to widen an
interface later than to narrow it.
msg169648 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-01 17:20
Disallowing non-contiguous arrays leads to very strange situations though.
I'm positive that there will be a bug report about this:

>>> x = memoryview(b'abc')[::-1]
>>> b = b'cba'
>>> d = {b'cba': 101}
>>> 
>>> b in d
True
>>> x == b
True
>>> x in d
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: memoryiew: hashing is restricted to C-contiguous arrays

Could we perhaps take a small poll? My own vote is:

1) Allow bytes hashing at all: +0.5

2) If 1) is allowed, then also non-contiguous hashing is allowed: +1

3) Allow multi-dimensional hashing: +-0
msg169654 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-09-01 17:44
Am 01.09.12 19:20, schrieb Stefan Krah:
> Disallowing non-contiguous arrays leads to very strange situations though.

I don't find that strange. That two object compare equal doesn't imply
that they both hash - only that *if* they hash, they should hash equal.

In any case, this can happen already:

py> x = memoryview(array.array('B',b'cba'))
py> b = b'cba'
py> d = {b'cba': 101}
py> b in d
True
py> x == b
True
py> x in d
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
ValueError: cannot hash writable memoryview object

It can also happen with other types (although I had to look around
a bit):

py> x=set((1,2,3))
py> b=frozenset(x)
py> d={b:101}
py> b in d
True
py> x==b
True
py> x in d
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'

> 1) Allow bytes hashing at all: +0.5

+0

> 2) If 1) is allowed, then also non-contiguous hashing is allowed: +1

-1

> 3) Allow multi-dimensional hashing: +-0

-1
msg169655 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-01 18:04
> Could we perhaps take a small poll? My own vote is:
> 
> 1) Allow bytes hashing at all: +0.5

+10. The buffer() object in 2.x was hashable, and a very important use
case of memoryview is replacing buffer().

> 2) If 1) is allowed, then also non-contiguous hashing is allowed: +1
> 3) Allow multi-dimensional hashing: +-0

I don't care about these. I kind of agree with Martin that hashability
should be motivated by an use case, so -0.
msg169657 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-01 18:06
> py> x = memoryview(array.array('B',b'cba'))

I find the array example is different. The user has to remember one thing:
memoryviews based on arrays don't hash.

For memoryviews based on bytes one would have to remember:

  - 'B', 'c' and 'b' hash

  - only C-contiguous arrays hash

  - b'abc'[::-1] hashes, but memoryview(b'abc')[::-1] does not
msg169659 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-09-01 18:22
Am 01.09.12 20:06, schrieb Stefan Krah:
>    - b'abc'[::-1] hashes, but memoryview(b'abc')[::-1] does not

I find that memoryview(b'abc')[::-1] is a strange thing to have,
anyway, so I'm not bothered by it behaving different. I can accept
that it needs to be supported for consistency, but I would recommend
against using it in applications. Who needs reversed memory, anyway?
The only case were I ever needed reversal is lists (typically sorted,
perhaps cronological).
msg169682 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-09-02 01:47
+1 for allowing bytes hashing. As Antoine noted, the 1D bytes variant of memoryview() fills the role previously handled by buffer().

+1 for allowing 1D non-contiguous hashing. This is from a simplicity perspective, as I don't want to have to explain to people why "hash(memoryview(b'12'[::2]))", "hash(memoryview(b'123'[::2]))" and "hash(memoryview(b'12')[::2])" all work, but "hash(memoryview(b'123')[::2])" fails.

-1 for allowing hashing when ndim > 1. We don't have a solid use case, and there's no odd value based corner cases to be considered.
msg169693 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-02 12:44
The totals are +11.5 :) for hashing, +1 for allowing non-contiguous and
-2 for multi-dimensional. I'll update the docs soon.
msg169694 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-09-02 12:57
New changeset 3b2597c1fe35 by Stefan Krah in branch 'default':
Issue #15814: Documentation: disallow hashing of multi-dimensional memoryviews.
http://hg.python.org/cpython/rev/3b2597c1fe35
msg169700 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2012-09-02 14:21
On Sep 2, 2012, at 8:44 AM, Stefan Krah <report@bugs.python.org> wrote:

> The totals are +11.5 :) for hashing, +1 for allowing non-contiguous and
> -2 for multi-dimensional

I have refrained from voting because in my line of work buffers or memoryviews deal with large objects that rarely serve as dictionary keys.  As a result, I have zero experince with hashing of buffers.  This observation supports the current consensus to limit hashing to 1d and 0d cases.  My only concern is that with hash(m) == hash(m.tobytes()) implementing multidimensional restriction will require artificial  if ndim > 1 check and an extra sentence in the docs while not simplifying anything.
msg169754 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-09-03 11:42
The main issue is that it's not quite clear how to deal with problems like C-style vs FORTRAN-style memory layouts and strides vs suboffsets in defining multidimensional hash equality. Without a use case, it's easier to just punt on the question and declare it illegal.

The 1D hashing case really just comes from wanting to have 1D bytes views behave as much like a bytes object as is practical.
msg169755 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-09-03 11:46
New changeset c9c9d890400c by Nick Coghlan in branch 'default':
Issue #15814: Add NEWS entry regarding intended memoryview hashing restrictions
http://hg.python.org/cpython/rev/c9c9d890400c
msg169757 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-03 12:38
Small nitpick: multi-dimensional hashing wasn't really accidental,
it was perfectly aligned with the previous statically typed equality
definition.

When I suggested PyBuffer_Hash = hash(obj.tobytes()) on python-dev for
non-contiguous and multi-dimensional arrays, I got a +1 from Raymond
(which I now see was a private message).


I don't see what could possibly be ill-defined about using the
tobytes() definition for ND-arrays. In all places memoryview now
uses the logical array, which is displayed by tolist().

Leaving aside static typing, both the previous and the new equality
definitions regarded C and Fortran arrays with the same list outputs
as equal.
msg169760 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2012-09-03 13:01
On Mon, Sep 3, 2012 at 8:38 AM, Stefan Krah <report@bugs.python.org> wrote:
> I don't see what could possibly be ill-defined about using the
> tobytes() definition for ND-arrays. In all places memoryview now
> uses the logical array, which is displayed by tolist().

+1.  The key restriction is that to byte views.  ND aspect does not
complicate the logic once .tobytes() is properly implemented.
msg169800 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-09-03 19:59
Am 02.09.2012 16:21, schrieb Alexander Belopolsky:
> I have refrained from voting because in my line of work buffers or
> memoryviews deal with large objects that rarely serve as dictionary
> keys.  As a result, I have zero experince with hashing of buffers.
> This observation supports the current consensus to limit hashing to
> 1d and 0d cases.  My only concern is that with hash(m) ==
> hash(m.tobytes()) implementing multidimensional restriction will
> require artificial  if ndim > 1 check and an extra sentence in the
> docs while not simplifying anything.

As for the "not simplifying argument": if hashing was restricted
to contiguous bytes, then the implementation would certainly be
simplified quite a bit: currently, if it's not contiguous, it needs
to make a separate copy and hash that. This code could go away
if hashing would only work for true memory "blocks".
msg169801 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2012-09-03 20:21
On Mon, Sep 3, 2012 at 3:59 PM, Martin v. Löwis <report@bugs.python.org> wrote:
> if hashing was restricted
> to contiguous bytes, then the implementation would certainly be
> simplified quite a bit: currently, if it's not contiguous, it needs
> to make a separate copy and hash that.

I would be happy if this whole business of "strided" memory layouts
would go away.  I find it to be a failed attempt of
over-generalization.  It is failed because it does not cover many
important formats even in 2-d: sparse matrices, symmetrical matrices,
banded matrices, etc., etc.  (Check level 3 BLAS or NAG for a longer
list.)  It is an over-generalizarion because in practice it is often
more efficient to gather your data in contiguous memory, perform
vector calculations and then scatter the results than to operate on
scattered data to avoid copying.

I would much rather see a generalization of suboffsets that would
allow me to have views into arrays of pointers to variable width
arrays than strides.
msg170051 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-09-08 13:41
New changeset ca81b9a3a015 by Stefan Krah in branch 'default':
Issue #15814: Update whatsnew to the current state of hashing memoryviews.
http://hg.python.org/cpython/rev/ca81b9a3a015
msg170052 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-08 13:51
Georg, thanks for including all changes that I've asked for!

If you still have the patience for the constant stream of memoryview
doc changes, there are three new ones that might be worth including:

3b2597c1fe35
c9c9d890400c
ca81b9a3a015
msg170092 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-09-09 09:18
New changeset 7734eb2707a1 by Stefan Krah in branch 'default':
Issue #15814: Document planned restrictions for memoryview hashes in 3.3.1.
http://hg.python.org/cpython/rev/7734eb2707a1

New changeset 71f4d80400f2 by Stefan Krah in branch 'default':
Issue #15814: Documentation: disallow hashing of multi-dimensional memoryviews.
http://hg.python.org/cpython/rev/71f4d80400f2

New changeset 3ffd6ad93fe4 by Nick Coghlan in branch 'default':
Issue #15814: Add NEWS entry regarding intended memoryview hashing restrictions
http://hg.python.org/cpython/rev/3ffd6ad93fe4

New changeset 88a0792e8ba3 by Stefan Krah in branch 'default':
Issue #15814: Update whatsnew to the current state of hashing memoryviews.
http://hg.python.org/cpython/rev/88a0792e8ba3
msg173201 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2012-10-17 20:16
The 3.3.0 docs now state "Note Hashing of memoryviews with formats other than ‘B’, ‘b’ or ‘c’ as well as hashing of multi-dimensional memoryviews is possible in version 3.3.0, but will raise an error in 3.3.1 in order to be compatible with the new memoryview equality definition."  I assume 3.3.1 should read 3.4, or have I missed something?
msg173203 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-10-17 20:24
3.3.1 is correct, you apparently missed msg169425.
msg174542 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-11-02 17:07
New changeset 969069d464bc by Stefan Krah in branch '3.3':
Issue #15814: Use hash function that is compatible with the equality
http://hg.python.org/cpython/rev/969069d464bc
History
Date User Action Args
2022-04-11 14:57:35adminsetgithub: 60018
2012-11-02 17:25:55skrahsetstatus: open -> closed
resolution: fixed
stage: needs patch -> resolved
2012-11-02 17:07:02python-devsetmessages: + msg174542
2012-10-17 20:24:04loewissetmessages: + msg173203
2012-10-17 20:16:21BreamoreBoysetnosy: + BreamoreBoy
messages: + msg173201
2012-09-09 09:18:58python-devsetmessages: + msg170092
2012-09-08 13:51:43skrahsetmessages: + msg170052
2012-09-08 13:41:24python-devsetmessages: + msg170051
2012-09-03 20:21:49Alexander.Belopolskysetmessages: + msg169801
2012-09-03 19:59:02loewissetmessages: + msg169800
2012-09-03 13:04:14scodersetnosy: - scoder
2012-09-03 13:01:20Alexander.Belopolskysetmessages: + msg169760
2012-09-03 12:38:18skrahsetmessages: + msg169757
2012-09-03 11:46:44python-devsetmessages: + msg169755
2012-09-03 11:42:51ncoghlansetmessages: + msg169754
2012-09-02 14:21:49Alexander.Belopolskysetmessages: + msg169700
2012-09-02 12:57:19python-devsetmessages: + msg169694
2012-09-02 12:44:30skrahsetmessages: + msg169693
2012-09-02 01:47:15ncoghlansetmessages: + msg169682
2012-09-01 18:22:28loewissetmessages: + msg169659
2012-09-01 18:06:32skrahsetmessages: + msg169657
2012-09-01 18:04:54pitrousetmessages: + msg169655
2012-09-01 17:44:06loewissetmessages: + msg169654
2012-09-01 17:20:22skrahsetmessages: + msg169648
2012-09-01 16:06:18loewissetmessages: + msg169645
2012-09-01 15:29:49skrahsetmessages: + msg169644
2012-09-01 15:16:24ncoghlansetmessages: + msg169643
2012-09-01 15:14:48Alexander.Belopolskysetmessages: + msg169642
2012-09-01 15:06:29skrahsetmessages: + msg169641
2012-09-01 14:42:11Alexander.Belopolskysetnosy: + Alexander.Belopolsky
messages: + msg169638
2012-09-01 14:24:21skrahsetmessages: + msg169636
2012-09-01 13:52:15loewissetmessages: + msg169633
2012-09-01 12:40:19python-devsetnosy: + python-dev
messages: + msg169627
2012-09-01 12:13:24ncoghlansetmessages: + msg169626
2012-09-01 11:22:16skrahsetfiles: + issue15814-2.diff

messages: + msg169622
2012-08-31 10:01:43loewissetmessages: + msg169515
2012-08-31 08:42:29skrahsetmessages: + msg169510
2012-08-31 08:26:49skrahsetmessages: + msg169508
2012-08-31 04:14:01scodersetmessages: + msg169499
2012-08-31 03:47:32Dag.Sverre.Seljebotnsetmessages: + msg169498
2012-08-31 03:31:31Dag.Sverre.Seljebotnsetnosy: + Dag.Sverre.Seljebotn
messages: + msg169497
2012-08-30 13:01:51loewissetmessages: + msg169451
2012-08-30 11:36:02skrahsetfiles: + issue15814-doc.diff
keywords: + patch
2012-08-30 11:35:31skrahsetmessages: + msg169450
2012-08-30 09:26:10skrahsetnosy: + scoder
2012-08-30 09:24:48skrahsetmessages: + msg169440
2012-08-30 00:26:02ncoghlansetmessages: + msg169425
2012-08-29 23:36:08belopolskysetnosy: + belopolsky
2012-08-29 20:50:40loewissetmessages: + msg169417
2012-08-29 20:27:31skrahsetmessages: + msg169416
2012-08-29 20:04:09skrahsetmessages: + msg169415
2012-08-29 19:33:47loewissetmessages: + msg169411
2012-08-29 19:06:28pitrousetmessages: + msg169409
2012-08-29 18:59:10loewissetmessages: + msg169408
2012-08-29 18:01:27pitrousetmessages: + msg169404
2012-08-29 17:54:45loewissetmessages: + msg169403
2012-08-29 17:07:08skrahcreate