Issue15814
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012-08-29 17:07 by skrah, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
issue15814-doc.diff | skrah, 2012-08-30 11:36 | review | ||
issue15814-2.diff | skrah, 2012-09-01 11:22 | review |
Messages (51) | |||
---|---|---|---|
msg169398 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-08-29 17:07 | |
The new PEP-3118 equality definition from #15573 that is based on element-wise comparisons breaks the equality-hash invariant: >>> from _testbuffer import ndarray >>> x = ndarray([1,2,3], shape=[3], format='f') >>> y = ndarray([1,2,3], shape=[3], format='B') >>> a = memoryview(x) >>> b = memoryview(y) >>> a == b True >>> hash(a) == hash(b) False >>> |
|||
msg169403 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-08-29 17:54 | |
I think the proper solution is to make memoryview objects unhashable. Any other approach will have flaws of some kind. |
|||
msg169404 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-08-29 18:01 | |
> I think the proper solution is to make memoryview objects unhashable. Disagreed. If memoryviews are to be bytes-like objects they should be hashable (at least when readonly). > Any other approach will have flaws of some kind. Not more so than equality between memoryviews. |
|||
msg169408 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-08-29 18:59 | |
Am 29.08.12 20:01, schrieb Antoine Pitrou: >> I think the proper solution is to make memoryview objects unhashable. > > Disagreed. If memoryviews are to be bytes-like objects they should be > hashable (at least when readonly). So what specific hash algorithm do you propose? >> Any other approach will have flaws of some kind. > > Not more so than equality between memoryviews. Well, memoryviews now have a definition of equality as discussed in issue15573. You may disagree whether it's a useful definition, but I don't believe it has actual flaws (in the sense that it is incorrect or inconsistent). My claim is that any hash definition for memoryviews will have a *fundamental* flaw, failing to provide the basic property that A==B must imply hash(A)==hash(B), making it actually work incorrectly (when used for its primary application, namely keys in dictionaries). |
|||
msg169409 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-08-29 19:06 | |
> Am 29.08.12 20:01, schrieb Antoine Pitrou: > >> I think the proper solution is to make memoryview objects unhashable. > > > > Disagreed. If memoryviews are to be bytes-like objects they should be > > hashable (at least when readonly). > > So what specific hash algorithm do you propose? The current algorithm works well in conjunction with bytes objects. > My claim is that any hash definition for memoryviews will have a > *fundamental* flaw, failing to provide the basic property > that A==B must imply hash(A)==hash(B), making it actually work > incorrectly Why is there such a fundamental flaw? |
|||
msg169411 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-08-29 19:33 | |
Am 29.08.12 21:06, schrieb Antoine Pitrou: >> So what specific hash algorithm do you propose? > > The current algorithm works well in conjunction with bytes objects. That's about the only type if works for. >> My claim is that any hash definition for memoryviews will have a >> *fundamental* flaw, failing to provide the basic property >> that A==B must imply hash(A)==hash(B), making it actually work >> incorrectly > > Why is there such a fundamental flaw? The current algorithm is flawed as described in Stefan's original message: two objects compare equal, yet hash different. That means that if you use them as dictionary keys, you may end up with two different values for the "same" key, depending on the size of the dictionary (as the modulo operation in the dictionary still may map the different hashes to the same dictionary slot). In general, since memoryview(obj)==obj, it would be necessary that hash(memoryview(obj))==hash(obj). However, since memoryview cannot know what hashing algorithm obj uses, it cannot compute the hash value with the same algorithm. IOW, hashing is mutually exclusive with comparison with the underlying object, unless you know what the hash algorithm of the underlying object is. So restricting tp_hash to memoryview objects where the underlying object is the bytes type would work. |
|||
msg169415 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-08-29 20:04 | |
Martin v. L??wis <report@bugs.python.org> wrote: > In general, since memoryview(obj)==obj, it would be necessary that > hash(memoryview(obj))==hash(obj). However, since memoryview cannot > know what hashing algorithm obj uses, it cannot compute the hash > value with the same algorithm. In the memoryview-hash thread on python-dev [1] this objection was addressed by demanding from exporters that they all use: hash(x) == hash(x.tobytes()) Since the previous equality concept was also based on x.tobytes() == y.tobytes(), this wasn't a problem. The new equality definition and any possible new hash definition should probably also be part of the buffer API documentation, since they aren't memoryview specific. [1] http://mail.python.org/pipermail/python-dev/2011-November/114459.html |
|||
msg169416 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-08-29 20:27 | |
And since memory_richcompare() and any potentially compatible hash function are quite tricky to implement by now, perhaps the generally useful parts could be exposed as PyBuffer_RichCompareBool() and PyBuffer_Hash(). |
|||
msg169417 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-08-29 20:50 | |
Am 29.08.12 22:04, schrieb Stefan Krah: > In the memoryview-hash thread on python-dev [1] this objection was > addressed by demanding from exporters that they all use: > > hash(x) == hash(x.tobytes()) > > Since the previous equality concept was also based on > x.tobytes() == y.tobytes(), this wasn't a problem. In the light of this requirement, it's even more difficult to ask people that they change their hashing, since some exporters may already comply with that original request. > The new equality definition and any possible new hash definition should > probably also be part of the buffer API documentation, since they > aren't memoryview specific. That's not true: they *are* memoryview-specific. The notion of equality is entirely one of memoryview objects, not of buffers. I still maintain that specifying hashing for memoryviews under the new equality definition is just not feasible, and that we should give up on it (except perhaps supporting the hashing of bytes views). I also question whether it is useful to hash arbitrarily-shaped read-only buffers (along with questioning whether people will actually *have* arbitrarily-shaped read-only buffers). Stefan: please do propose a semantics also along with proposing interfaces. |
|||
msg169425 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2012-08-30 00:26 | |
My perspective is that hashing a memoryview only makes sense when the memoryview is read-only and "m == m.tobytes()" (i.e. it's a C contiguous 1D view of bytes, either because that's what the original object exported as a buffer or because the view has been cast that way) So, I think we should document that restriction in What's New and the memoryview docs for 3.3.0, and actually enforce it in 3.3.1 by raising ValueError from hash(m) when the assumption of a 1D C contiguous bytes view is violated (the "read-only" restriction is already enforced). In previous versions, this constraint didn't need to be explicitly enforced, since memoryview basically treated *everything* as a 1D C contiguous view of bytes. |
|||
msg169440 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-08-30 09:24 | |
Martin v. L??wis <report@bugs.python.org> wrote: > > hash(x) == hash(x.tobytes()) > In the light of this requirement, it's even more difficult to ask > people that they change their hashing, since some exporters may already > comply with that original request. I don't think so. memoryview.__hash__() is new in 3.3 and the requirement is not documented at all in the general PEP-3118 sections. [Adding Stefan Behnel to nosy, since Cython is pretty quick to pick up new features.] > > The new equality definition and any possible new hash definition should > > probably also be part of the buffer API documentation, since they > > aren't memoryview specific. > > That's not true: they *are* memoryview-specific. The notion of equality > is entirely one of memoryview objects, not of buffers. Could you name a part of the equality definition that is memoryview-specific? > I still maintain that specifying hashing for memoryviews under the > new equality definition is just not feasible, and that we should give > up on it (except perhaps supporting the hashing of bytes views). > I also question whether it is useful to hash arbitrarily-shaped > read-only buffers (along with questioning whether people will actually > *have* arbitrarily-shaped read-only buffers). Useful, perhaps. I don't know if it is worth the effort though. We could restrict hashing to contiguous bytes views in 3.3.1. |
|||
msg169450 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-08-30 11:35 | |
Here's a doc patch restricting the hash to formats 'B', 'b' and 'c'. I think non-contiguous views are fine: both __eq__() and tobytes() handle these, so the equality-hash invariant is preserved. |
|||
msg169451 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-08-30 13:01 | |
Am 30.08.12 11:24, schrieb Stefan Krah: >>> The new equality definition and any possible new hash definition should >>> probably also be part of the buffer API documentation, since they >>> aren't memoryview specific. >> >> That's not true: they *are* memoryview-specific. The notion of equality >> is entirely one of memoryview objects, not of buffers. > > Could you name a part of the equality definition that is memoryview-specific? As literally spelled out, it starts with "a memoryview and a PEP 3118 exporter". But I was actually thinking of the implementation... In any case, I'm skeptical that this is a useful extension to the buffer interface. You would imply "two exporters are equal iff their exported buffers are equal". I think this is asking too much. |
|||
msg169497 - (view) | Author: Dag Sverre Seljebotn (Dag.Sverre.Seljebotn) | Date: 2012-08-31 03:31 | |
It is perfectly possible for an object to export memory in a read-only way that may still change. Another object may have a writeable view: x = obj.readonly_view y = obj.writable_view obj.move_to_next_image() # changes memory under x, y So, hashing using tobytes() doesn't make any sense at all to me. A memoryview != its contents. You could compare memoryviews simply by comparing the Py_buffer structure, but that's going to be confusing for a lot of users. I would really prefer unhashable (+ if needed a method for comparing contents). (FWIW, not sure how relevant this is; in NumPy, == does In [1]: np.array([1,2,3]) == np.array([1,3,3]) Out[1]: array([ True, False, True], dtype=bool) Cython will follow this behaviour for its "typed memory views", which is Cython's builting mechanism for PEP 3118 which is not quite the same as CPython "memoryview". But I understand that following this behaviour is probably out of the question for CPython.) |
|||
msg169498 - (view) | Author: Dag Sverre Seljebotn (Dag.Sverre.Seljebotn) | Date: 2012-08-31 03:47 | |
OK, I can understand the desire to make memoryviews be bytes-like objects (though to my mind, bytes is "frozen" in a very different way...) If so, and it is deemed worth it, my suggestion is to add a new PyBUF_CONST flag to the buffer acquisition in that case (which can not be used together with PyBUF_WRITABLE). Given that flag, the exporter guarantees that the contents does not change (or fails to give away a buffer). Perhaps it could be possible for hash() to try to re-acquire a const buffer, so that some buffers are hashable (by contents) and others not. (I really think relying on buffers that are not writeable to be constant is dangerous. A readonly memoryview could for instance point straight to the live output of a webcam.) |
|||
msg169499 - (view) | Author: Stefan Behnel (scoder) * | Date: 2012-08-31 04:14 | |
To add on Dag's comments, this essentially means that any caching of the hash value is dangerous, unless it can be assured that the underlying buffer definitely has not changed in the meantime. There is no way for users to explicitly tell a memoryview to rehash itself, so putting it into one dict, then modifying the content, then putting it into another is a perfectly reasonable use case for users, but fails when the memory view caches its hash value. As far as I can see it, any "obvious" fix for this creates a whole bath tub full of worms: updating views transitively, adding new APIs, supporting new buffer parameters, ... |
|||
msg169508 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-08-31 08:26 | |
Dag Sverre Seljebotn <report@bugs.python.org> wrote: > It is perfectly possible for an object to export memory in a read-only > way that may still change. Another object may have a writeable view: > > x = obj.readonly_view > y = obj.writable_view > obj.move_to_next_image() # changes memory under x, y PEP-3118 always means different things to different people. :) Your distinction between constant, readonly, writable is one way to interpret things. But looking at the use in the Python source tree, readonly does mean immutable. Unfortunately the PEP doesn't specify readonly: Is it the same as "mount a partition readonly", "readonly CD" or O_RDONLY? Also, existing practice was that in response to a PyBUF_FULL_RO request the exporter was free to hand out a writable buffer! This is now documented and subverts reliable tagging for consumers: http://docs.python.org/dev/c-api/buffer.html#readonly-format If any of this is a major inconvenience to NumPy or other users of the buffer API, perhaps you could open a separate issue. > (FWIW, not sure how relevant this is; in NumPy, == does > > In [1]: np.array([1,2,3]) == np.array([1,3,3]) > Out[1]: array([ True, False, True], dtype=bool) Since afa3dedfee18 we compare memoryviews and any exporter in the same manner as NumPy, except that the return value is a single boolean. So I think the new comparison is equivalent to all(a == b) in NumPy. |
|||
msg169510 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-08-31 08:42 | |
Dag Sverre Seljebotn <report@bugs.python.org> wrote: > OK, I can understand the desire to make memoryviews be bytes-like objects > (though to my mind, bytes is "frozen" in a very different way...) We have two desires that sometimes conflict: People who want to use memoryview as a *buffer* legitimately want it to behave like bytes. People who want to have an "arrayview" want it to behave like NumPy or PIL arrays. The new implementation preserves fast paths for bytes and otherwise behaves pretty much like a very restricted version of NumPy's ndarray (with support for suboffsets), except that not all planned multi-dimensional features have been implemented yet. > If so, and it is deemed worth it, my suggestion is to add a new > PyBUF_CONST flag to the buffer acquisition in that case Note that memory_hash() raises an error if the exporter *itself* is not hashable, so it only hashes immutable objects by design. But the PyBUF_CONST flag might be worth considering (in another issue). |
|||
msg169515 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-08-31 10:01 | |
Am 31.08.12 06:14, schrieb Stefan Behnel: > To add on Dag's comments, this essentially means that any caching of > the hash value is dangerous Stefan Krah is right here: the proper test (which is already implemented) is whether the underlying object is hashable; this is a sufficient test. Caching of the hash is always correct, or else the hash function is incorrect: a hash of an object must not change. If it would change, you couldn't find the key anymore in a dictionary. > unless it can be assured that the > underlying buffer definitely has not changed in the meantime. There > is no way for users to explicitly tell a memoryview to rehash itself, > so putting it into one dict, then modifying the content, then putting > it into another is a perfectly reasonable use case for users, No, it's not: d = {} o = some_object m = memoryview(o) d[m] = -1 o.modify() # so that hash(m) changes print(d[m]) # keyerror, even though m in d.keys() > As far as I can see it, any "obvious" fix for this creates a whole > bath tub full of worms: updating views transitively, adding new APIs, > supporting new buffer parameters, ... So above (in this thread): the easiest fix is to just drop hashing support for memoryview objects. This can be extended in steps: - support hashing if the underlying object is bytes - support hashing if the shape is 1D "B", and the underlying object is hashable - support hashing if the exporter has somehow promised to not modify the memory |
|||
msg169622 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-09-01 11:22 | |
Here's a patch that enforces byte formats. Does everyone agree on (or tolerate at least) allowing 'B', 'b' and 'c'? I think we should at least commit the doc patch for 3.3.0. Otherwise implementors of exporting objects might waste time on a feature that's deprecated. |
|||
msg169626 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2012-09-01 12:13 | |
+1 for docs patch for 3.3.0 and then enforcing the format restriction for 3.3.1. |
|||
msg169627 - (view) | Author: Roundup Robot (python-dev) | Date: 2012-09-01 12:40 | |
New changeset 895e123d9476 by Stefan Krah in branch 'default': Issue #15814: Document planned restrictions for memoryview hashes in 3.3.1. http://hg.python.org/cpython/rev/895e123d9476 |
|||
msg169633 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-09-01 13:52 | |
Am 01.09.12 13:22, schrieb Stefan Krah: > Does everyone agree on (or tolerate at least) allowing 'B', 'b' and 'c'? Why be more permissive than necessary? -0 on the committed version; it should IMO further restrict it to 1D contiguous byte arrays. |
|||
msg169636 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-09-01 14:24 | |
Martin v. L??wis <report@bugs.python.org> wrote: > Why be more permissive than necessary? -0 on the committed version; > it should IMO further restrict it to 1D contiguous byte arrays. Does "byte arrays" include 'b' and 'c' or just 'B'? I don't see a reason to allow 'B' but not the others. I'm +-0 on allowing multi-dimensional arrays, but it would be odd to restrict hashing to contiguous arrays: >>> b = b'abcdefhhijkl' >>> m = memoryview(b) >>> b[::-1] b'lkjihhfedcba' >>> bytes(m[::-1]) b'lkjihhfedcba' >>> hash(b[::-1]) == hash(m[::-1]) True My reasoning was: If non-contiguous arrays are allowed (and I think they should be), why not allow multi-dimensional arrays, too? The definition hash(m) == hash(m.tobytes()) is pretty straightforward. |
|||
msg169638 - (view) | Author: Alexander Belopolsky (Alexander.Belopolsky) | Date: 2012-09-01 14:42 | |
On Sep 1, 2012, at 10:24 AM, Stefan Krah <report@bugs.python.org> wrote: > > The definition hash(m) == hash(m.tobytes()) is pretty straightforward. I probably missed something from the early discussion, but doesn't this definition only work for 1d (or 0d) views? Shouldn't shape come into calculation for multidimensional views? |
|||
msg169641 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-09-01 15:06 | |
tobytes() is the same as the flattened multi-dimensional list representation with all elements converted to bytes. If I'm not mistaken, that's how NumPy's tostring() behaves. |
|||
msg169642 - (view) | Author: Alexander Belopolsky (Alexander.Belopolsky) | Date: 2012-09-01 15:14 | |
On Sep 1, 2012, at 11:06 AM, Stefan Krah <report@bugs.python.org> wrote: > tobytes() is the same as the flattened multi-dimensional list representation > with all elements converted to bytes. This is correct, but why is it desirable to have deliberate hash collisions between views with different shapes? |
|||
msg169643 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2012-09-01 15:16 | |
Keep in mind that its OK if hash(m) == hash(m.tobytes()) in some cases where "m != m.tobytes()". The only cases we really need to kill are those that break the hash invariant. I don't like the idea of making the definition of hash more complicated just to rule out cases that aren't actually broken. |
|||
msg169644 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-09-01 15:29 | |
> why is it desirable to have deliberate hash collisions between views with different shapes? Since we're now restricting everything to bytes, the multi-dimensional case is probably not useful at all. As I said above, I would leave it in because it actually saves one check for ndim and as Nick said, the definition is simpler. Now the user only has to remember that hashing works for 'B', 'b' and 'c'. |
|||
msg169645 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-09-01 16:06 | |
Am 01.09.12 16:24, schrieb Stefan Krah: > Does "byte arrays" include 'b' and 'c' or just 'B'? I don't see a reason > to allow 'B' but not the others. Either type is fine with me. It's the multi-dimensional aspect I'd like to ban. > My reasoning was: If non-contiguous arrays are allowed (and I think they > should be), why not allow multi-dimensional arrays, too? I think neither should be allowed. The explicit request (from Antoine) was that memoryviews of bytes objects should be hashable. I haven't heard anybody asking for something more general (but I could see that people want to hash other memory blocks as well, such as mmap views). I can see that it is tempting to provide the most general definition, but see where this approach got us. It's always easier to widen an interface later than to narrow it. |
|||
msg169648 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-09-01 17:20 | |
Disallowing non-contiguous arrays leads to very strange situations though. I'm positive that there will be a bug report about this: >>> x = memoryview(b'abc')[::-1] >>> b = b'cba' >>> d = {b'cba': 101} >>> >>> b in d True >>> x == b True >>> x in d Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: memoryiew: hashing is restricted to C-contiguous arrays Could we perhaps take a small poll? My own vote is: 1) Allow bytes hashing at all: +0.5 2) If 1) is allowed, then also non-contiguous hashing is allowed: +1 3) Allow multi-dimensional hashing: +-0 |
|||
msg169654 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-09-01 17:44 | |
Am 01.09.12 19:20, schrieb Stefan Krah: > Disallowing non-contiguous arrays leads to very strange situations though. I don't find that strange. That two object compare equal doesn't imply that they both hash - only that *if* they hash, they should hash equal. In any case, this can happen already: py> x = memoryview(array.array('B',b'cba')) py> b = b'cba' py> d = {b'cba': 101} py> b in d True py> x == b True py> x in d Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: cannot hash writable memoryview object It can also happen with other types (although I had to look around a bit): py> x=set((1,2,3)) py> b=frozenset(x) py> d={b:101} py> b in d True py> x==b True py> x in d Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'set' > 1) Allow bytes hashing at all: +0.5 +0 > 2) If 1) is allowed, then also non-contiguous hashing is allowed: +1 -1 > 3) Allow multi-dimensional hashing: +-0 -1 |
|||
msg169655 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-09-01 18:04 | |
> Could we perhaps take a small poll? My own vote is: > > 1) Allow bytes hashing at all: +0.5 +10. The buffer() object in 2.x was hashable, and a very important use case of memoryview is replacing buffer(). > 2) If 1) is allowed, then also non-contiguous hashing is allowed: +1 > 3) Allow multi-dimensional hashing: +-0 I don't care about these. I kind of agree with Martin that hashability should be motivated by an use case, so -0. |
|||
msg169657 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-09-01 18:06 | |
> py> x = memoryview(array.array('B',b'cba')) I find the array example is different. The user has to remember one thing: memoryviews based on arrays don't hash. For memoryviews based on bytes one would have to remember: - 'B', 'c' and 'b' hash - only C-contiguous arrays hash - b'abc'[::-1] hashes, but memoryview(b'abc')[::-1] does not |
|||
msg169659 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-09-01 18:22 | |
Am 01.09.12 20:06, schrieb Stefan Krah: > - b'abc'[::-1] hashes, but memoryview(b'abc')[::-1] does not I find that memoryview(b'abc')[::-1] is a strange thing to have, anyway, so I'm not bothered by it behaving different. I can accept that it needs to be supported for consistency, but I would recommend against using it in applications. Who needs reversed memory, anyway? The only case were I ever needed reversal is lists (typically sorted, perhaps cronological). |
|||
msg169682 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2012-09-02 01:47 | |
+1 for allowing bytes hashing. As Antoine noted, the 1D bytes variant of memoryview() fills the role previously handled by buffer(). +1 for allowing 1D non-contiguous hashing. This is from a simplicity perspective, as I don't want to have to explain to people why "hash(memoryview(b'12'[::2]))", "hash(memoryview(b'123'[::2]))" and "hash(memoryview(b'12')[::2])" all work, but "hash(memoryview(b'123')[::2])" fails. -1 for allowing hashing when ndim > 1. We don't have a solid use case, and there's no odd value based corner cases to be considered. |
|||
msg169693 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-09-02 12:44 | |
The totals are +11.5 :) for hashing, +1 for allowing non-contiguous and -2 for multi-dimensional. I'll update the docs soon. |
|||
msg169694 - (view) | Author: Roundup Robot (python-dev) | Date: 2012-09-02 12:57 | |
New changeset 3b2597c1fe35 by Stefan Krah in branch 'default': Issue #15814: Documentation: disallow hashing of multi-dimensional memoryviews. http://hg.python.org/cpython/rev/3b2597c1fe35 |
|||
msg169700 - (view) | Author: Alexander Belopolsky (Alexander.Belopolsky) | Date: 2012-09-02 14:21 | |
On Sep 2, 2012, at 8:44 AM, Stefan Krah <report@bugs.python.org> wrote: > The totals are +11.5 :) for hashing, +1 for allowing non-contiguous and > -2 for multi-dimensional I have refrained from voting because in my line of work buffers or memoryviews deal with large objects that rarely serve as dictionary keys. As a result, I have zero experince with hashing of buffers. This observation supports the current consensus to limit hashing to 1d and 0d cases. My only concern is that with hash(m) == hash(m.tobytes()) implementing multidimensional restriction will require artificial if ndim > 1 check and an extra sentence in the docs while not simplifying anything. |
|||
msg169754 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2012-09-03 11:42 | |
The main issue is that it's not quite clear how to deal with problems like C-style vs FORTRAN-style memory layouts and strides vs suboffsets in defining multidimensional hash equality. Without a use case, it's easier to just punt on the question and declare it illegal. The 1D hashing case really just comes from wanting to have 1D bytes views behave as much like a bytes object as is practical. |
|||
msg169755 - (view) | Author: Roundup Robot (python-dev) | Date: 2012-09-03 11:46 | |
New changeset c9c9d890400c by Nick Coghlan in branch 'default': Issue #15814: Add NEWS entry regarding intended memoryview hashing restrictions http://hg.python.org/cpython/rev/c9c9d890400c |
|||
msg169757 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-09-03 12:38 | |
Small nitpick: multi-dimensional hashing wasn't really accidental, it was perfectly aligned with the previous statically typed equality definition. When I suggested PyBuffer_Hash = hash(obj.tobytes()) on python-dev for non-contiguous and multi-dimensional arrays, I got a +1 from Raymond (which I now see was a private message). I don't see what could possibly be ill-defined about using the tobytes() definition for ND-arrays. In all places memoryview now uses the logical array, which is displayed by tolist(). Leaving aside static typing, both the previous and the new equality definitions regarded C and Fortran arrays with the same list outputs as equal. |
|||
msg169760 - (view) | Author: Alexander Belopolsky (Alexander.Belopolsky) | Date: 2012-09-03 13:01 | |
On Mon, Sep 3, 2012 at 8:38 AM, Stefan Krah <report@bugs.python.org> wrote: > I don't see what could possibly be ill-defined about using the > tobytes() definition for ND-arrays. In all places memoryview now > uses the logical array, which is displayed by tolist(). +1. The key restriction is that to byte views. ND aspect does not complicate the logic once .tobytes() is properly implemented. |
|||
msg169800 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-09-03 19:59 | |
Am 02.09.2012 16:21, schrieb Alexander Belopolsky: > I have refrained from voting because in my line of work buffers or > memoryviews deal with large objects that rarely serve as dictionary > keys. As a result, I have zero experince with hashing of buffers. > This observation supports the current consensus to limit hashing to > 1d and 0d cases. My only concern is that with hash(m) == > hash(m.tobytes()) implementing multidimensional restriction will > require artificial if ndim > 1 check and an extra sentence in the > docs while not simplifying anything. As for the "not simplifying argument": if hashing was restricted to contiguous bytes, then the implementation would certainly be simplified quite a bit: currently, if it's not contiguous, it needs to make a separate copy and hash that. This code could go away if hashing would only work for true memory "blocks". |
|||
msg169801 - (view) | Author: Alexander Belopolsky (Alexander.Belopolsky) | Date: 2012-09-03 20:21 | |
On Mon, Sep 3, 2012 at 3:59 PM, Martin v. Löwis <report@bugs.python.org> wrote: > if hashing was restricted > to contiguous bytes, then the implementation would certainly be > simplified quite a bit: currently, if it's not contiguous, it needs > to make a separate copy and hash that. I would be happy if this whole business of "strided" memory layouts would go away. I find it to be a failed attempt of over-generalization. It is failed because it does not cover many important formats even in 2-d: sparse matrices, symmetrical matrices, banded matrices, etc., etc. (Check level 3 BLAS or NAG for a longer list.) It is an over-generalizarion because in practice it is often more efficient to gather your data in contiguous memory, perform vector calculations and then scatter the results than to operate on scattered data to avoid copying. I would much rather see a generalization of suboffsets that would allow me to have views into arrays of pointers to variable width arrays than strides. |
|||
msg170051 - (view) | Author: Roundup Robot (python-dev) | Date: 2012-09-08 13:41 | |
New changeset ca81b9a3a015 by Stefan Krah in branch 'default': Issue #15814: Update whatsnew to the current state of hashing memoryviews. http://hg.python.org/cpython/rev/ca81b9a3a015 |
|||
msg170052 - (view) | Author: Stefan Krah (skrah) * | Date: 2012-09-08 13:51 | |
Georg, thanks for including all changes that I've asked for! If you still have the patience for the constant stream of memoryview doc changes, there are three new ones that might be worth including: 3b2597c1fe35 c9c9d890400c ca81b9a3a015 |
|||
msg170092 - (view) | Author: Roundup Robot (python-dev) | Date: 2012-09-09 09:18 | |
New changeset 7734eb2707a1 by Stefan Krah in branch 'default': Issue #15814: Document planned restrictions for memoryview hashes in 3.3.1. http://hg.python.org/cpython/rev/7734eb2707a1 New changeset 71f4d80400f2 by Stefan Krah in branch 'default': Issue #15814: Documentation: disallow hashing of multi-dimensional memoryviews. http://hg.python.org/cpython/rev/71f4d80400f2 New changeset 3ffd6ad93fe4 by Nick Coghlan in branch 'default': Issue #15814: Add NEWS entry regarding intended memoryview hashing restrictions http://hg.python.org/cpython/rev/3ffd6ad93fe4 New changeset 88a0792e8ba3 by Stefan Krah in branch 'default': Issue #15814: Update whatsnew to the current state of hashing memoryviews. http://hg.python.org/cpython/rev/88a0792e8ba3 |
|||
msg173201 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2012-10-17 20:16 | |
The 3.3.0 docs now state "Note Hashing of memoryviews with formats other than ‘B’, ‘b’ or ‘c’ as well as hashing of multi-dimensional memoryviews is possible in version 3.3.0, but will raise an error in 3.3.1 in order to be compatible with the new memoryview equality definition." I assume 3.3.1 should read 3.4, or have I missed something? |
|||
msg173203 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2012-10-17 20:24 | |
3.3.1 is correct, you apparently missed msg169425. |
|||
msg174542 - (view) | Author: Roundup Robot (python-dev) | Date: 2012-11-02 17:07 | |
New changeset 969069d464bc by Stefan Krah in branch '3.3': Issue #15814: Use hash function that is compatible with the equality http://hg.python.org/cpython/rev/969069d464bc |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:35 | admin | set | github: 60018 |
2012-11-02 17:25:55 | skrah | set | status: open -> closed resolution: fixed stage: needs patch -> resolved |
2012-11-02 17:07:02 | python-dev | set | messages: + msg174542 |
2012-10-17 20:24:04 | loewis | set | messages: + msg173203 |
2012-10-17 20:16:21 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg173201 |
2012-09-09 09:18:58 | python-dev | set | messages: + msg170092 |
2012-09-08 13:51:43 | skrah | set | messages: + msg170052 |
2012-09-08 13:41:24 | python-dev | set | messages: + msg170051 |
2012-09-03 20:21:49 | Alexander.Belopolsky | set | messages: + msg169801 |
2012-09-03 19:59:02 | loewis | set | messages: + msg169800 |
2012-09-03 13:04:14 | scoder | set | nosy:
- scoder |
2012-09-03 13:01:20 | Alexander.Belopolsky | set | messages: + msg169760 |
2012-09-03 12:38:18 | skrah | set | messages: + msg169757 |
2012-09-03 11:46:44 | python-dev | set | messages: + msg169755 |
2012-09-03 11:42:51 | ncoghlan | set | messages: + msg169754 |
2012-09-02 14:21:49 | Alexander.Belopolsky | set | messages: + msg169700 |
2012-09-02 12:57:19 | python-dev | set | messages: + msg169694 |
2012-09-02 12:44:30 | skrah | set | messages: + msg169693 |
2012-09-02 01:47:15 | ncoghlan | set | messages: + msg169682 |
2012-09-01 18:22:28 | loewis | set | messages: + msg169659 |
2012-09-01 18:06:32 | skrah | set | messages: + msg169657 |
2012-09-01 18:04:54 | pitrou | set | messages: + msg169655 |
2012-09-01 17:44:06 | loewis | set | messages: + msg169654 |
2012-09-01 17:20:22 | skrah | set | messages: + msg169648 |
2012-09-01 16:06:18 | loewis | set | messages: + msg169645 |
2012-09-01 15:29:49 | skrah | set | messages: + msg169644 |
2012-09-01 15:16:24 | ncoghlan | set | messages: + msg169643 |
2012-09-01 15:14:48 | Alexander.Belopolsky | set | messages: + msg169642 |
2012-09-01 15:06:29 | skrah | set | messages: + msg169641 |
2012-09-01 14:42:11 | Alexander.Belopolsky | set | nosy:
+ Alexander.Belopolsky messages: + msg169638 |
2012-09-01 14:24:21 | skrah | set | messages: + msg169636 |
2012-09-01 13:52:15 | loewis | set | messages: + msg169633 |
2012-09-01 12:40:19 | python-dev | set | nosy:
+ python-dev messages: + msg169627 |
2012-09-01 12:13:24 | ncoghlan | set | messages: + msg169626 |
2012-09-01 11:22:16 | skrah | set | files:
+ issue15814-2.diff messages: + msg169622 |
2012-08-31 10:01:43 | loewis | set | messages: + msg169515 |
2012-08-31 08:42:29 | skrah | set | messages: + msg169510 |
2012-08-31 08:26:49 | skrah | set | messages: + msg169508 |
2012-08-31 04:14:01 | scoder | set | messages: + msg169499 |
2012-08-31 03:47:32 | Dag.Sverre.Seljebotn | set | messages: + msg169498 |
2012-08-31 03:31:31 | Dag.Sverre.Seljebotn | set | nosy:
+ Dag.Sverre.Seljebotn messages: + msg169497 |
2012-08-30 13:01:51 | loewis | set | messages: + msg169451 |
2012-08-30 11:36:02 | skrah | set | files:
+ issue15814-doc.diff keywords: + patch |
2012-08-30 11:35:31 | skrah | set | messages: + msg169450 |
2012-08-30 09:26:10 | skrah | set | nosy:
+ scoder |
2012-08-30 09:24:48 | skrah | set | messages: + msg169440 |
2012-08-30 00:26:02 | ncoghlan | set | messages: + msg169425 |
2012-08-29 23:36:08 | belopolsky | set | nosy:
+ belopolsky |
2012-08-29 20:50:40 | loewis | set | messages: + msg169417 |
2012-08-29 20:27:31 | skrah | set | messages: + msg169416 |
2012-08-29 20:04:09 | skrah | set | messages: + msg169415 |
2012-08-29 19:33:47 | loewis | set | messages: + msg169411 |
2012-08-29 19:06:28 | pitrou | set | messages: + msg169409 |
2012-08-29 18:59:10 | loewis | set | messages: + msg169408 |
2012-08-29 18:01:27 | pitrou | set | messages: + msg169404 |
2012-08-29 17:54:45 | loewis | set | messages: + msg169403 |
2012-08-29 17:07:08 | skrah | create |