Issue 35548: memoryview needlessly (?) requires represented object to be hashable

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/79729

classification

Title:	memoryview needlessly (?) requires represented object to be hashable
Type:	enhancement	Stage:
Components:	Library (Lib)	Versions:	Python 3.8, Python 3.7

process

Status:	open	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	Kentzo, mark.dickinson, rhettinger, skrah
Priority:	normal	Keywords:

Created on 2018-12-21 06:18 by Kentzo, last changed 2022-04-11 14:59 by admin.

Messages (8)
msg332280 - (view)	Author: Ilya Kulakov (Kentzo)	Date: 2018-12-21 06:18
Implementation of memoryview's hashing method [1] imposes the following constraints in order to be hashable (per documentation): > One-dimensional memoryviews of hashable (read-only) types with formats ‘B’, ‘b’ or ‘c’ are also hashable. The hash is defined as hash(m) == hash(m.tobytes()): However it's not clear why original type needs to be hashable given that memoryview deals with 1-dimensional read-only bytes representation of an object. Not only it requires the developer to make an extra copy of C-bytes, but also calls __hash__ of a represented object without using the result other than to detect an error. My particular use case involves a memory view of a readonly numpy's ndarray. My view satisfies the following constraints: >>> print(data.format, data.readonly, data.shape, data.c_contiguous) b True (25,) True But nevertheless the hashing fails because ndarray itself is not hashable. Stefan Krah wrote [2]: > Note that memory_hash() raises an error if the exporter itself is not hashable, so it only hashes immutable objects by design. But while __hash__ indeed tells that object is (supposed to be) immutable, there is no requirement for all immutable objects to have __hash__. Both threads I have found ([3], [4]) are quite lengthy and show certain discord in opinions regarding the issue. Perhaps after 6 years since the release of the feature the view on the problem has changed? 1: https://github.com/python/cpython/blob/d1e717588728a23d576c4ead775f7dbd68149696/Objects/memoryobject.c#L2829-L2876 2: https://bugs.python.org/issue15814#msg169510 3: https://bugs.python.org/issue15573 4: https://bugs.python.org/issue15573
msg332305 - (view)	Author: Stefan Krah (skrah) *	Date: 2018-12-21 22:40
The reason is that unfortunately readonly != immutable, as the following example shows: >>> import numpy as np >>> x = np.array([1,2,3], dtype='B') >>> y = x[:] >>> y.flags['WRITEABLE'] = False >>> m = memoryview(y) >>> m.readonly True >>> m.tolist() [1, 2, 3] >>> x[0] = 100 >>> m.tolist() [100, 2, 3] An object like 'y' is allowed to re-export memory, using its own flags.
msg332316 - (view)	Author: Ilya Kulakov (Kentzo)	Date: 2018-12-22 00:15
True, but perhaps it's too strict to require both memoryview and the represented object to be immutable? The logic is as follows: Every object in Python can be seen as a view of some outside data (in memory, on disk etc.). And while Python's runtime may attempt to guarantee immutability of its objects, it cannot guarantee immutability of the outside data. Therefore a hashable object is an object immutable from the perspective of Python's runtime. Memory views connect Python objects with outside data. They can be marked as immutable by Python's runtime. Therefore it should be sufficient to be hashable regardless of volatility of outside data. In your example the immutability of the ndarray is indeed bypassed via a view created beforehand. But that's implementation detail and may actually be a flaw (i.e. being unable to propagate the change). The documentation [1] somewhat warns your that this behavior might change: > However, currently, locking a base object does not lock any views that already reference it, so under that circumstance it is possible to alter the contents of a locked array via a previously created writeable view onto it. 1: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.ndarray.flags.html
msg332343 - (view)	Author: Stefan Krah (skrah) *	Date: 2018-12-22 11:24
The feature would violate fundamental Python invariants. If you modify the example above: >>> t = (m,) >>> b"\001\002\003" in t True >>> x[0] = 100 >>> b"\001\002\003" in t False This is simply never supposed to happen in Python. If an immutable object (Bytes) is regarded as being a member of a tuple, it should stay in that tuple. The issue has to be fixed on the NumPy side: They could implement a scheme that allows hashing of a specific ndarray if a new flag "IMMUTABLE" is set that locks the exporter and all exports. I don't thing NumPy's current behavior can be regarded as a bug. As I said, "readonly" never meant "immutable", it was always a property of a single view.
msg332344 - (view)	Author: Stefan Krah (skrah) *	Date: 2018-12-22 11:38
Sorry I meant the above example to use a dict, which currently does not work: >>> d = {m: "1"} Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'numpy.ndarray' Then the feature would allow to change the dict key.
msg332345 - (view)	Author: Stefan Krah (skrah) *	Date: 2018-12-22 12:16
I'll leave the issue up for a couple of days in case someone supports it, but I think this one of the rare cases where all core devs would reject the feature unanimously.
msg332373 - (view)	Author: Ilya Kulakov (Kentzo)	Date: 2018-12-22 21:52
Perhaps another path is optionally allow hashing of memoryviews (all current conditions - hashability of the original object) via a parameter? Like unsafe_hash like in dataclass.
msg332375 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2018-12-22 23:37
Mark, do you have thoughts on the subject? > Perhaps another path is optionally allow hashing of memoryviews > (all current conditions - hashability of the original object) > via a parameter? Like unsafe_hash like in dataclass. I suspect this would open a can worms and that we would regret it. If a user intentionally creates an unsafe tool using dataclasses, they have to do so explicitly. Built-in tools such as memoryview shouldn't cross that line (especially as a default behavior). Also, tools like memoryview() are implemented in C and generally have tighter requirements (thread-safety, protecting invariants, not segfaulting, etc) than pure python code that can't kill the interpreter or affect C extensions. > I'll leave the issue up for a couple of days in case someone supports > it, but I think this one of the rare cases where all core devs > would reject the feature unanimously. I concur with Stefan. > My particular use case involves a memory view of a readonly numpy's ndarray. As Stefan mentioned, this could be "fixed" on the Numpy side if they thought it was a useful behavior (I suspect not).

History
Date	User	Action	Args
2022-04-11 14:59:09	admin	set	github: 79729
2018-12-22 23:37:23	rhettinger	set	nosy: + mark.dickinson, rhettinger messages: + msg332375
2018-12-22 21:52:51	Kentzo	set	status: pending -> open messages: + msg332373
2018-12-22 12:16:54	skrah	set	status: open -> pending type: enhancement resolution: not a bug messages: + msg332345
2018-12-22 11:38:56	skrah	set	messages: + msg332344
2018-12-22 11:24:08	skrah	set	messages: + msg332343
2018-12-22 00:15:52	Kentzo	set	messages: + msg332316
2018-12-21 22:40:41	skrah	set	messages: + msg332305
2018-12-21 21:47:33	terry.reedy	set	versions: + Python 3.8, - Python 3.4, Python 3.5, Python 3.6
2018-12-21 06:18:19	Kentzo	create