classification
Title: memoryview needlessly (?) requires represented object to be hashable
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: open Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Kentzo, mark.dickinson, rhettinger, skrah
Priority: normal Keywords:

Created on 2018-12-21 06:18 by Kentzo, last changed 2018-12-22 23:37 by rhettinger.

Messages (8)
msg332280 - (view) Author: Ilya Kulakov (Kentzo) Date: 2018-12-21 06:18
Implementation of memoryview's hashing method [1] imposes the following constraints in order to be hashable (per documentation):

> One-dimensional memoryviews of hashable (read-only) types with formats ‘B’, ‘b’ or ‘c’ are also hashable. The hash is defined as hash(m) == hash(m.tobytes()):

However it's not clear why original type needs to be hashable given that memoryview deals with 1-dimensional read-only bytes representation of an object. Not only it requires the developer to make an extra copy of C-bytes, but also calls __hash__ of a represented object without using the result other than to detect an error.

My particular use case involves a memory view of a readonly numpy's ndarray. My view satisfies the following constraints:

>>> print(data.format, data.readonly, data.shape, data.c_contiguous)
b True (25,) True

But nevertheless the hashing fails because ndarray itself is not hashable.

Stefan Krah wrote [2]:

> Note that memory_hash() raises an error if the exporter *itself* is
not hashable, so it only hashes immutable objects by design.

But while __hash__ indeed tells that object is (supposed to be) immutable, there is no requirement for all immutable objects to have __hash__.

Both threads I have found ([3], [4]) are quite lengthy and show certain discord in opinions regarding the issue. Perhaps after 6 years since the release of the feature the view on the problem has changed?

1: https://github.com/python/cpython/blob/d1e717588728a23d576c4ead775f7dbd68149696/Objects/memoryobject.c#L2829-L2876
2: https://bugs.python.org/issue15814#msg169510
3: https://bugs.python.org/issue15573
4: https://bugs.python.org/issue15573
msg332305 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2018-12-21 22:40
The reason is that unfortunately readonly != immutable, as the
following example shows:

>>> import numpy as np
>>> x = np.array([1,2,3], dtype='B')
>>> y = x[:]
>>> y.flags['WRITEABLE'] = False
>>> m = memoryview(y)
>>> m.readonly
True
>>> m.tolist()
[1, 2, 3]
>>> x[0] = 100
>>> m.tolist()
[100, 2, 3]


An object like 'y' is allowed to re-export memory, using its own flags.
msg332316 - (view) Author: Ilya Kulakov (Kentzo) Date: 2018-12-22 00:15
True, but perhaps it's too strict to require both memoryview and the represented object to be immutable?

The logic is as follows: 

Every object in Python can be seen as a view of some outside data (in memory, on disk etc.). And while Python's runtime may attempt to guarantee immutability of its objects, it cannot guarantee immutability of the outside data. Therefore a hashable object is an object immutable from the perspective of Python's runtime.
Memory views connect Python objects with outside data. They can be marked as immutable by Python's runtime. Therefore it should be sufficient to be hashable regardless of volatility of outside data.

In your example the immutability of the ndarray is indeed bypassed via a view created beforehand. But that's implementation detail and may actually be a flaw (i.e. being unable to propagate the change). The documentation [1] somewhat warns your that this behavior might change:

> However, **currently**, locking a base object does not lock any views that already reference it, so under that circumstance it is possible to alter the contents of a locked array via a previously created writeable view onto it.

1: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.ndarray.flags.html
msg332343 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2018-12-22 11:24
The feature would violate fundamental Python invariants. If you modify the example above:

>>> t = (m,)
>>> b"\001\002\003" in t
True
>>> x[0] = 100
>>> b"\001\002\003" in t
False


This is simply never supposed to happen in Python. If an immutable
object (Bytes) is regarded as being a member of a tuple, it should
stay in that tuple.


The issue has to be fixed on the NumPy side: They could implement
a scheme that allows hashing of a specific ndarray if a new flag
"IMMUTABLE" is set that locks the exporter and all exports.


I don't thing NumPy's current behavior can be regarded as a bug.
As I said, "readonly" never meant "immutable", it was always a
property of a single view.
msg332344 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2018-12-22 11:38
Sorry I meant the above example to use a dict, which currently
does not work:

>>> d = {m: "1"}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'


Then the feature would allow to change the dict key.
msg332345 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2018-12-22 12:16
I'll leave the issue up for a couple of days in case someone supports it, but I think this one of the rare cases where all core devs would reject the feature unanimously.
msg332373 - (view) Author: Ilya Kulakov (Kentzo) Date: 2018-12-22 21:52
Perhaps another path is optionally allow hashing of memoryviews (all current conditions - hashability of the original object) via a parameter? Like unsafe_hash like in dataclass.
msg332375 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-12-22 23:37
Mark, do you have thoughts on the subject?

> Perhaps another path is optionally allow hashing of memoryviews
>  (all current conditions - hashability of the original object)
>  via a parameter? Like unsafe_hash like in dataclass.

I suspect this would open a can worms and that we would regret it.

If a user intentionally creates an unsafe tool using dataclasses, they have to do so explicitly.  Built-in tools such as memoryview shouldn't cross that line (especially as a default behavior).  

Also, tools like memoryview() are implemented in C and generally have tighter requirements (thread-safety, protecting invariants, not segfaulting, etc) than pure python code that can't kill the interpreter or affect C extensions.


> I'll leave the issue up for a couple of days in case someone supports
> it, but I think this one of the rare cases where all core devs 
> would reject the feature unanimously.

I concur with Stefan.

> My particular use case involves a memory view of a readonly numpy's ndarray.

As Stefan mentioned, this could be "fixed" on the Numpy side if they thought it was a useful behavior (I suspect not).
History
Date User Action Args
2018-12-22 23:37:23rhettingersetnosy: + mark.dickinson, rhettinger
messages: + msg332375
2018-12-22 21:52:51Kentzosetstatus: pending -> open

messages: + msg332373
2018-12-22 12:16:54skrahsetstatus: open -> pending
type: enhancement
resolution: not a bug
messages: + msg332345
2018-12-22 11:38:56skrahsetmessages: + msg332344
2018-12-22 11:24:08skrahsetmessages: + msg332343
2018-12-22 00:15:52Kentzosetmessages: + msg332316
2018-12-21 22:40:41skrahsetmessages: + msg332305
2018-12-21 21:47:33terry.reedysetversions: + Python 3.8, - Python 3.4, Python 3.5, Python 3.6
2018-12-21 06:18:19Kentzocreate