Issue 15944: memoryviews and ctypes

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/60148

classification

Title:	memoryviews and ctypes
Type:	behavior	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.6, Python 3.5

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	skrah	Nosy List:	cblp, dabeaz, eryksun, josh.r, martin.panter, pitrou, python-dev, skrah
Priority:	normal	Keywords:	patch

Created on 2012-09-14 15:05 by dabeaz, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
cast-bytes.patch	martin.panter, 2015-08-06 16:50	Universal cast to bytes	review

Messages (31)
msg170477 - (view)	Author: David Beazley (dabeaz)	Date: 2012-09-14 15:05
I've been playing with the interaction of ctypes and memoryviews and am curious about intended behavior. Consider the following: >>> import ctypes >>> d = ctypes.c_double() >>> m = memoryview(d) >>> m.ndim 0 >>> m.shape () >>> m.readonly False >>> m.itemsize 8 >>> As you can see, you have a memory view for the ctypes double object. However, the fact that it has a 0-dimension and no shape seems to cause all sorts of weird behavior. For instance, indexing and slicing don't work: >>> m[0] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: invalid indexing of 0-dim memory >>> m[:] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: invalid indexing of 0-dim memory >>> As such, you can't really seem to do anything interesting with the resulting memory view. For example, you can't pull data out of it. Nor can you overwrite the contents (i.e., replacing the contents with an 8-byte byte string). Attempting to cast the memory view to something else doesn't work either. >>> d = ctypes.c_double() >>> m = memoryview(d) >>> m2 = m.cast('c') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: memoryview: source format must be a native single character format prefixed with an optional '@' >>> I must be missing something really obvious here. Is there no way to get access to the memory behind a ctypes object?
msg170480 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-09-14 15:37
You can still read the underlying representation: >>> d = ctypes.c_double(0.6) >>> m = memoryview(d) >>> bytes(m) b'333333\xe3?' >>> d.value = 0.7 >>> bytes(m) b'ffffff\xe6?'
msg170481 - (view)	Author: David Beazley (dabeaz)	Date: 2012-09-14 15:43
I don't want to read the representation by copying it into a bytes object. I want direct access to the underlying memory--including the ability to modify it. As it stands now, it's completely useless.
msg170482 - (view)	Author: Stefan Krah (skrah) *	Date: 2012-09-14 16:01
0-dim memory is indexed by x[()]. The ctypes example has an additional problem, because format="<d" is not yet implemented in memoryview. Only native single character formats in struct module syntax are implemented, and "<d" in struct module syntax means "standard size, little endian". To demonstrate 0-dim indexing, here's an example using _testbuffer: >>> x = ndarray(3.14, shape=[], format='d', flags=ND_WRITABLE) >>> x[()] 3.14 >>> tau = 6.28 >>> x[()] = tau >>> x[()] 6.28 >>> m = memoryview(x) >>> m[()] 6.28 >>> m[()] = 100.111 >>> m[()] 100.111
msg170483 - (view)	Author: Stefan Krah (skrah) *	Date: 2012-09-14 16:08
BTW, if c_double means "native machine double", then ctypes should fill in Py_buffer.format with "d" and not "<d" in order to be PEP-3118 compatible.
msg170484 - (view)	Author: David Beazley (dabeaz)	Date: 2012-09-14 16:19
Even with the <d format, I'm not sure why it can't be cast to simple byte-view. None of that seems to work at all.
msg170487 - (view)	Author: Stefan Krah (skrah) *	Date: 2012-09-14 16:42
The decision was made in order to be able to cast back and forth between known formats. Otherwise one would be able to cast from '<d' to 'B' but not from 'B' to '<d'. Python 3.4 will have support for all formats in struct module syntax, but all non-native formats will be far slower than the native ones. You can still pack/unpack directly using the struct module: >>> import ctypes, struct >>> d = ctypes.c_double() >>> m = memoryview(d) >>> struct.pack_into(m.format, m, 0, 22.7) >>> struct.unpack_from(m.format, m, 0)[0] 22.7
msg170488 - (view)	Author: David Beazley (dabeaz)	Date: 2012-09-14 16:47
I don't think memoryviews should be imposing any casting restrictions at all. It's low level. Get out of the way.
msg170489 - (view)	Author: Stefan Krah (skrah) *	Date: 2012-09-14 16:53
So you want to be able to segfault the core interpreter using the builtins?
msg170490 - (view)	Author: David Beazley (dabeaz)	Date: 2012-09-14 17:00
No, I want to be able to access the raw bytes sitting behind a memoryview as bytes without all of this casting and reinterpretation. Just show me the raw bytes. Not doubles, not ints, not structure packing, not copying into byte strings, or whatever. Is this really impossible? It sure seems so.
msg170492 - (view)	Author: David Beazley (dabeaz)	Date: 2012-09-14 17:08
Just to be specific, why is something like this not possible? >>> d = ctypes.c_double() >>> m = memoryview(d) >>> m[0:8] = b'abcdefgh' >>> d.value 8.540883223036124e+194 >>> (Doesn't have to be exactly like this, but what's wrong with overwriting bytes with bytes of a compatible size?).
msg170494 - (view)	Author: David Beazley (dabeaz)	Date: 2012-09-14 17:30
I should add that 0-dim indexing doesn't work as described either: >>> import ctypes >>> d = ctypes.c_double() >>> m = memoryview(d) >>> m[()] Traceback (most recent call last): File "<stdin>", line 1, in <module> NotImplementedError: memoryview: unsupported format <d >>>
msg170496 - (view)	Author: Stefan Krah (skrah) *	Date: 2012-09-14 18:11
Please read msg170482. It even contains a copy and paste example!
msg170795 - (view)	Author: Stefan Krah (skrah) *	Date: 2012-09-20 09:23
As I understand it, you prefer memoryviews where the format is purely informational, whereas we now have typed memoryviews. Typed memoryviews are certainly useful, in fact they are present in Cython, see here for examples: http://docs.cython.org/src/userguide/memoryviews.html I can see only one obvious benefit of ignoring the format: All possible formats are accepted. What I don't understand is why this ... m[0] = b'\x00\x00\x00\x01' ... should be preferable to: m[0] = 1 If you think that typed memoryviews are a mistake, I suggest raising the issue on python-dev as soon as possible (3.3 is due soon). All memoryview operations are now based on values instead of bit patterns, see for example #15573.
msg170818 - (view)	Author: David Beazley (dabeaz)	Date: 2012-09-20 15:42
There's probably a bigger discussion about memoryviews for a rainy day. However, the number one thing that would save all of this in my book would be to make sure cast('B') is universally supported regardless of format including endianness--especially in the standard library. For example, being able to do this: >>> a = array.array('d',[1.0, 2.0, 3.0, 4.0]) >>> m = memoryview(a).cast('B') >>> m[0:4] = b'\x00\x01\x02\x03' >>> a array('d', [1.0000000112050316, 2.0, 3.0, 4.0]) >>> Right now, it doesn't work for ctypes. For example: >>> import ctypes >>> a = (ctypes.c_double * 4)(1,2,3,4) >>> a <__main__.c_double_Array_4 object at 0x1006a7cb0> >>> m = memoryview(a).cast('B') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: memoryview: source format must be a native single character format prefixed with an optional '@' >>> As some background, being able to work with a "byte" view of memory is important for a lot of problems involving I/O, data interchange, and related problems where being able to accurately construct/deconstruct the underlying memory buffers is more useful than actually interpreting their contents.
msg170819 - (view)	Author: David Beazley (dabeaz)	Date: 2012-09-20 15:49
One followup note---I think it's fine to punt on cast('B') if the memoryview is non-contiguous. That's a rare case that's probably not as common.
msg229560 - (view)	Author: Stefan Krah (skrah) *	Date: 2014-10-16 21:46
We could add a flag memoryview(x, raw=True) to the constructor. This view would behave exactly like the regular one except that it ignores buf.format entirely. So you could do assignments like: m[10] = b'\x00\x00\x00\x01' This would be more flexible in general since memoryview currently only supports native struct formats (complex formats slow down certain operations dramatically). I think the feature would not add much additional complexity to the code. The question is: Is this a general need? Are many people are using memoryviews for bit-twiddling?
msg248037 - (view)	Author: Yuriy Syrovetskiy (cblp)	Date: 2015-08-05 13:19
You don't need `raw=True`, `.cast('b')` already must do this. But unfortunately, is is not implemented yet.
msg248089 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-08-06 00:44
In my experience, I tend to only use memoryview() for “bytes-like” buffers (but see Issue 23756 about clarifying what this means). Example from /Lib/_compression.py:67: def readinto(self, b): with memoryview(b) as view, view.cast("B") as byte_view: data = self.read(len(byte_view)) byte_view[:len(data)] = data return len(data) Fixing cast("B") or adding a memoryview(raw=True) mode could probably help when all you want is a byte buffer.
msg248111 - (view)	Author: Eryk Sun (eryksun) *	Date: 2015-08-06 06:28
A functional memoryview for ctypes objects would avoid having to use workarounds, such as the following: >>> d = ctypes.c_double() >>> b = (ctypes.c_char * ctypes.sizeof(d)).from_buffer(d) >>> b[:] = b'abcdefgh' >>> d.value 8.540883223036124e+194 or using numpy.frombuffer as a bridge: >>> d = ctypes.c_double() >>> m = memoryview(numpy.frombuffer(d, 'B')) >>> m[:] = b'abcdefgh' >>> d.value 8.540883223036124e+194 David's request that cast('B') should be made to work for all contiguous buffers seems reasonable. That said, the ctypes format strings also need fixing. Let's see what happens when "@d" is used instead of "<d": >>> double_stgdict = stgdict(ctypes.c_double) >>> double_stgdict dict: ob_base: ob_refcnt: 1 ob_type: py_object(<class 'StgDict'>) ma_used: 7 ma_keys: LP_PyDictKeysObject(0x1aa5750) ma_values: LP_LP_PyObject(<NULL>) size: 8 align: 8 length: 0 ffi_type_pointer: size: 8 alignment: 8 type: 3 elements: <NULL> proto: py_object('d') setfunc: SETFUNC(0x7f9f9b6e3e60) getfunc: GETFUNC(0x7f9f9b6e3d90) paramfunc: PARAMFUNC(0x7f9f9b6e31d0) argtypes: py_object(<NULL>) converters: py_object(<NULL>) restype: py_object(<NULL>) checker: py_object(<NULL>) flags: 4096 format: b'<d' ndim: 0 shape: LP_c_long(<NULL>) >>> double_stgdict.format = b'@d' >>> d = ctypes.c_double(3.14) >>> m = memoryview(d) >>> m[()] 3.14 >>> m[()] = 6.28 >>> d.value 6.28 >>> m = m.cast('B') >>> m[:] = b'abcdefgh' >>> d.value 8.540883223036124e+194 This shows that changing the format string (set by PyCSimpleType_new in _ctypes.c) to use "@" makes the memoryview work normally. OTOH, the swapped type (e.g. c_double.__ctype_be__) would need to continue to use a standard little-endian ("<") or big-endian (">") format.
msg248127 - (view)	Author: Stefan Krah (skrah) *	Date: 2015-08-06 13:45
Yuriy: cast() does not do this. What's requested is that e.g. a single float is represented as a bytes object instead of a float. Thus, you'd be able to do: m[0] = b'\x00\x00\x00\x01' This has other implications, for example, two NaNs would compare equal. Hence the suggestion memoryview(raw=True).
msg248134 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-08-06 16:50
Here is a patch that allows any “C-contiguous” memoryview() to be cast to a byte view. Apart from the test that was explicitly checking that this wasn’t supported, the rest of the test suite still passes. I basically removed the check that was generating the “source format must be a native single character” error. If two NANs are represented by the same byte sequence, I would expect their byte views to compare equal, which is the case with my patch.
msg248135 - (view)	Author: Stefan Krah (skrah) *	Date: 2015-08-06 17:04
The question is whether we want this behavior.
msg248164 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-08-07 01:55
Assuming Issue 23756 is resolved and various standard library functions are meant to work with any C-contiguous buffer, then it makes sense to me for memoryview.cast("B") to work for any C-contiguous buffer. I also got the impression that David, Yuriy, and Eryksun all support this. I don’t understand why you wouldn’t want this behaviour. It seems pointless just to maintain symmetry with being unable to cast back to “<d”. And casting from e.g. floating point to bytes to integers already disregards the original data type, so casting from unsupported types to bytes should be no worse.
msg248182 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-08-07 09:00
The proposal sounds reasonable to me.
msg248186 - (view)	Author: Stefan Krah (skrah) *	Date: 2015-08-07 12:57
If people are content with writing m[124:128] = b'abcd' and accept that tolist() etc. won't represent the original structure of the object, then let's do it. On the bright side, it is less work. -- I'll review the patch.
msg248189 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-08-07 13:12
Le 07/08/2015 14:57, Stefan Krah a écrit : > > If people are content with writing m[124:128] = b'abcd' and accept > that tolist() etc. won't represent the original structure of the > object, then let's do it. As long as the casting has to be explicit, this sounds ok to me.
msg248190 - (view)	Author: Stefan Krah (skrah) *	Date: 2015-08-07 13:15
Ok, shall we sneak this past Larry for 3.5?
msg248191 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-08-07 13:22
Why not :)
msg248261 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-08-08 11:39
New changeset e33f2b8b937f by Stefan Krah in branch '3.5': Issue #15944: memoryview: Allow arbitrary formats when casting to bytes. https://hg.python.org/cpython/rev/e33f2b8b937f New changeset c7c4b8411037 by Stefan Krah in branch 'default': Merge #15944. https://hg.python.org/cpython/rev/c7c4b8411037
msg248262 - (view)	Author: Stefan Krah (skrah) *	Date: 2015-08-08 11:45
Done. Thanks for the patch.

History
Date	User	Action	Args
2022-04-11 14:57:36	admin	set	github: 60148
2015-08-08 11:45:15	skrah	set	status: open -> closed versions: + Python 3.5 messages: + msg248262 components: + Interpreter Core resolution: fixed stage: patch review -> resolved
2015-08-08 11:39:43	python-dev	set	nosy: + python-dev messages: + msg248261
2015-08-07 13:22:12	pitrou	set	messages: + msg248191
2015-08-07 13:15:49	skrah	set	messages: + msg248190
2015-08-07 13:12:37	pitrou	set	messages: + msg248189 title: memoryview: allow all casts to bytes -> memoryviews and ctypes
2015-08-07 13:09:27	skrah	set	title: memoryviews and ctypes -> memoryview: allow all casts to bytes
2015-08-07 12:57:56	skrah	set	messages: + msg248186
2015-08-07 09:00:24	pitrou	set	messages: + msg248182
2015-08-07 01:55:16	martin.panter	set	messages: + msg248164
2015-08-06 17:04:45	skrah	set	assignee: skrah messages: + msg248135
2015-08-06 16:50:23	martin.panter	set	files: + cast-bytes.patch versions: + Python 3.6, - Python 3.5 messages: + msg248134 keywords: + patch stage: patch review
2015-08-06 13:45:34	skrah	set	messages: + msg248127
2015-08-06 06:28:48	eryksun	set	nosy: + eryksun messages: + msg248111
2015-08-06 00:44:05	martin.panter	set	messages: + msg248089
2015-08-05 13:19:39	cblp	set	nosy: + cblp messages: + msg248037
2014-10-18 04:58:33	martin.panter	set	nosy: + martin.panter
2014-10-17 02:07:14	josh.r	set	nosy: + josh.r
2014-10-16 21:46:31	skrah	set	versions: + Python 3.5, - Python 3.3
2014-10-16 21:46:19	skrah	set	messages: + msg229560
2013-03-23 13:07:43	skrah	link	issue16204 superseder
2012-09-20 15:49:59	dabeaz	set	messages: + msg170819
2012-09-20 15:42:42	dabeaz	set	messages: + msg170818
2012-09-20 09:23:19	skrah	set	messages: + msg170795
2012-09-14 18:11:46	skrah	set	messages: + msg170496
2012-09-14 17:30:49	dabeaz	set	messages: + msg170494
2012-09-14 17:08:57	dabeaz	set	messages: + msg170492
2012-09-14 17:00:06	dabeaz	set	messages: + msg170490
2012-09-14 16:53:20	skrah	set	messages: + msg170489
2012-09-14 16:47:05	dabeaz	set	messages: + msg170488
2012-09-14 16:42:02	skrah	set	messages: + msg170487
2012-09-14 16:19:16	dabeaz	set	messages: + msg170484
2012-09-14 16:08:09	skrah	set	messages: + msg170483
2012-09-14 16:01:23	skrah	set	messages: + msg170482
2012-09-14 15:43:06	dabeaz	set	messages: + msg170481
2012-09-14 15:37:41	pitrou	set	nosy: + skrah, pitrou messages: + msg170480
2012-09-14 15:05:29	dabeaz	create