classification
Title: memoryviews and ctypes
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: skrah Nosy List: cblp, dabeaz, eryksun, josh.r, martin.panter, pitrou, python-dev, skrah
Priority: normal Keywords: patch

Created on 2012-09-14 15:05 by dabeaz, last changed 2015-08-08 11:45 by skrah. This issue is now closed.

Files
File name Uploaded Description Edit
cast-bytes.patch martin.panter, 2015-08-06 16:50 Universal cast to bytes review
Messages (31)
msg170477 - (view) Author: David Beazley (dabeaz) Date: 2012-09-14 15:05
I've been playing with the interaction of ctypes and memoryviews and am curious about intended behavior.  Consider the following:

>>> import ctypes
>>> d = ctypes.c_double()
>>> m = memoryview(d)
>>> m.ndim
0
>>> m.shape
()
>>> m.readonly
False
>>> m.itemsize
8
>>>

As you can see, you have a memory view for the ctypes double object.  However, the fact that it has a 0-dimension and no shape seems to cause all sorts of weird behavior.  For instance, indexing and slicing don't work:

>>> m[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: invalid indexing of 0-dim memory
>>> m[:]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: invalid indexing of 0-dim memory
>>> 

As such, you can't really seem to do anything interesting with the resulting memory view.  For example, you can't pull data out of it.  Nor can you overwrite the contents (i.e., replacing the contents with an 8-byte byte string).

Attempting to cast the memory view to something else doesn't work either.

>>> d = ctypes.c_double()
>>> m = memoryview(d)
>>> m2 = m.cast('c')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: memoryview: source format must be a native single character format prefixed with an optional '@'
>>> 

I must be missing something really obvious here.  Is there no way to get access to the memory behind a ctypes object?
msg170480 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-14 15:37
You can still read the underlying representation:

>>> d = ctypes.c_double(0.6)
>>> m = memoryview(d)
>>> bytes(m)
b'333333\xe3?'
>>> d.value = 0.7
>>> bytes(m)
b'ffffff\xe6?'
msg170481 - (view) Author: David Beazley (dabeaz) Date: 2012-09-14 15:43
I don't want to read the representation by copying it into a bytes object.  I want direct access to the underlying memory--including the ability to modify it.  As it stands now, it's completely useless.
msg170482 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-14 16:01
0-dim memory is indexed by x[()]. The ctypes example has an additional
problem, because format="<d" is not yet implemented in memoryview.

Only native single character formats in struct module syntax are
implemented, and "<d" in struct module syntax means "standard size,
little endian".

To demonstrate 0-dim indexing, here's an example using _testbuffer:

>>> x = ndarray(3.14, shape=[], format='d', flags=ND_WRITABLE)
>>> x[()]
3.14
>>> tau = 6.28
>>> x[()] = tau
>>> x[()]
6.28
>>> m = memoryview(x)
>>> m[()]
6.28
>>> m[()] = 100.111
>>> m[()]
100.111
msg170483 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-14 16:08
BTW, if c_double means "native machine double", then ctypes should
fill in Py_buffer.format with "d" and not "<d" in order to be PEP-3118
compatible.
msg170484 - (view) Author: David Beazley (dabeaz) Date: 2012-09-14 16:19
Even with the <d format, I'm not sure why it can't be cast to simple byte-view.  None of that seems to work at all.
msg170487 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-14 16:42
The decision was made in order to be able to cast back and forth between
known formats. Otherwise one would be able to cast from '<d' to 'B'
but not from 'B' to '<d'.

Python 3.4 will have support for all formats in struct module syntax,
but all non-native formats will be *far* slower than the native ones.

You can still pack/unpack directly using the struct module:

>>> import ctypes, struct
>>> d = ctypes.c_double()
>>> m = memoryview(d)
>>> struct.pack_into(m.format, m, 0, 22.7)
>>> struct.unpack_from(m.format, m, 0)[0]
22.7
msg170488 - (view) Author: David Beazley (dabeaz) Date: 2012-09-14 16:47
I don't think memoryviews should be imposing any casting restrictions at all. It's low level.  Get out of the way.
msg170489 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-14 16:53
So you want to be able to segfault the core interpreter using the
builtins?
msg170490 - (view) Author: David Beazley (dabeaz) Date: 2012-09-14 17:00
No, I want to be able to access the raw bytes sitting behind a memoryview as bytes without all of this casting and reinterpretation.  Just show me the raw bytes.  Not doubles, not ints, not structure packing, not copying into byte strings, or whatever.   Is this really impossible?   It sure seems so.
msg170492 - (view) Author: David Beazley (dabeaz) Date: 2012-09-14 17:08
Just to be specific, why is something like this not possible?

>>> d = ctypes.c_double()
>>> m = memoryview(d)
>>> m[0:8] = b'abcdefgh'
>>> d.value
8.540883223036124e+194
>>>

(Doesn't have to be exactly like this, but what's wrong with overwriting bytes with bytes of a compatible size?).
msg170494 - (view) Author: David Beazley (dabeaz) Date: 2012-09-14 17:30
I should add that 0-dim indexing doesn't work as described either:

>>> import ctypes
>>> d = ctypes.c_double()
>>> m = memoryview(d)
>>> m[()]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError: memoryview: unsupported format <d
>>>
msg170496 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-14 18:11
Please read msg170482. It even contains a copy and paste example!
msg170795 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-20 09:23
As I understand it, you prefer memoryviews where the format is
purely informational, whereas we now have typed memoryviews.

Typed memoryviews are certainly useful, in fact they are
present in Cython, see here for examples:

http://docs.cython.org/src/userguide/memoryviews.html


I can see only one obvious benefit of ignoring the format: All possible
formats are accepted. What I don't understand is why this ...

  m[0] = b'\x00\x00\x00\x01'

... should be preferable to:

  m[0] = 1



If you think that typed memoryviews are a mistake, I suggest raising
the issue on python-dev as soon as possible (3.3 is due soon). All
memoryview operations are now based on values instead of bit patterns,
see for example #15573.
msg170818 - (view) Author: David Beazley (dabeaz) Date: 2012-09-20 15:42
There's probably a bigger discussion about memoryviews for a rainy day.  However, the number one thing that would save all of this in my book would be to make sure cast('B') is universally supported regardless of format including endianness--especially in the standard library. For example, being able to do this:

>>> a = array.array('d',[1.0, 2.0, 3.0, 4.0])
>>> m = memoryview(a).cast('B')
>>> m[0:4] = b'\x00\x01\x02\x03'
>>> a
array('d', [1.0000000112050316, 2.0, 3.0, 4.0])
>>> 

Right now, it doesn't work for ctypes.  For example:

>>> import ctypes
>>> a = (ctypes.c_double * 4)(1,2,3,4)
>>> a
<__main__.c_double_Array_4 object at 0x1006a7cb0>
>>> m = memoryview(a).cast('B')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: memoryview: source format must be a native single character format prefixed with an optional '@'
>>> 

As some background, being able to work with a "byte" view of memory is important for a lot of problems involving I/O, data interchange, and related problems where being able to accurately construct/deconstruct the underlying memory buffers is more useful than actually interpreting their contents.
msg170819 - (view) Author: David Beazley (dabeaz) Date: 2012-09-20 15:49
One followup note---I think it's fine to punt on cast('B') if the memoryview is non-contiguous.  That's a rare case that's probably not as common.
msg229560 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-10-16 21:46
We could add a flag memoryview(x, raw=True) to the constructor.  This view 
would behave exactly like the regular one except that it ignores buf.format 
entirely.

So you could do assignments like:

   m[10] = b'\x00\x00\x00\x01'


This would be more flexible in general since memoryview currently only supports
native struct formats (complex formats slow down certain operations dramatically).

I think the feature would not add much additional complexity to the code.


The question is:  Is this a general need?  Are many people are using memoryviews
for bit-twiddling?
msg248037 - (view) Author: Yuriy Syrovetskiy (cblp) Date: 2015-08-05 13:19
You don't need `raw=True`, `.cast('b')` already must do this. But unfortunately, is is not implemented yet.
msg248089 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-08-06 00:44
In my experience, I tend to only use memoryview() for “bytes-like” buffers (but see Issue 23756 about clarifying what this means). Example from /Lib/_compression.py:67:

def readinto(self, b):
    with memoryview(b) as view, view.cast("B") as byte_view:
        data = self.read(len(byte_view))
        byte_view[:len(data)] = data
    return len(data)

Fixing cast("B") or adding a memoryview(raw=True) mode could probably help when all you want is a byte buffer.
msg248111 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-08-06 06:28
A functional memoryview for ctypes objects would avoid having to use workarounds, such as the following:

    >>> d = ctypes.c_double()
    >>> b = (ctypes.c_char * ctypes.sizeof(d)).from_buffer(d)
    >>> b[:] = b'abcdefgh'
    >>> d.value
    8.540883223036124e+194

or using numpy.frombuffer as a bridge:

    >>> d = ctypes.c_double()
    >>> m = memoryview(numpy.frombuffer(d, 'B'))
    >>> m[:] = b'abcdefgh'
    >>> d.value
    8.540883223036124e+194

David's request that cast('B') should be made to work for all contiguous buffers seems reasonable. That said, the ctypes format strings also need fixing. Let's see what happens when "@d" is used instead of "<d":

    >>> double_stgdict = stgdict(ctypes.c_double)
    >>> double_stgdict
    dict: 
        ob_base: 
            ob_refcnt: 1
            ob_type: py_object(<class 'StgDict'>)
        ma_used: 7
        ma_keys: LP_PyDictKeysObject(0x1aa5750)
        ma_values: LP_LP_PyObject(<NULL>)
    size: 8
    align: 8
    length: 0
    ffi_type_pointer: 
        size: 8
        alignment: 8
        type: 3
        elements: <NULL>
    proto: py_object('d')
    setfunc: SETFUNC(0x7f9f9b6e3e60)
    getfunc: GETFUNC(0x7f9f9b6e3d90)
    paramfunc: PARAMFUNC(0x7f9f9b6e31d0)
    argtypes: py_object(<NULL>)
    converters: py_object(<NULL>)
    restype: py_object(<NULL>)
    checker: py_object(<NULL>)
    flags: 4096
    format: b'<d'
    ndim: 0
    shape: LP_c_long(<NULL>)

    >>> double_stgdict.format = b'@d'

    >>> d = ctypes.c_double(3.14)
    >>> m = memoryview(d)
    >>> m[()]
    3.14
    >>> m[()] = 6.28
    >>> d.value
    6.28

    >>> m = m.cast('B')
    >>> m[:] = b'abcdefgh'
    >>> d.value
    8.540883223036124e+194

This shows that changing the format string (set by PyCSimpleType_new in _ctypes.c) to use "@" makes the memoryview work normally. OTOH, the swapped type (e.g. c_double.__ctype_be__) would need to continue to use a standard little-endian ("<") or big-endian (">") format.
msg248127 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-08-06 13:45
Yuriy: cast() does not do this.  What's requested is that e.g. a
single float is represented as a bytes object instead of a float.

Thus, you'd be able to do:

  m[0] = b'\x00\x00\x00\x01'

This has other implications, for example, two NaNs would compare
equal.  Hence the suggestion memoryview(raw=True).
msg248134 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-08-06 16:50
Here is a patch that allows any “C-contiguous” memoryview() to be cast to a byte view. Apart from the test that was explicitly checking that this wasn’t supported, the rest of the test suite still passes. I basically removed the check that was generating the “source format must be a native single character” error.

If two NANs are represented by the same byte sequence, I would expect their byte views to compare equal, which is the case with my patch.
msg248135 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-08-06 17:04
The question is whether we want this behavior.
msg248164 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-08-07 01:55
Assuming Issue 23756 is resolved and various standard library functions are meant to work with any C-contiguous buffer, then it makes sense to me for memoryview.cast("B") to work for any C-contiguous buffer. I also got the impression that David, Yuriy, and Eryksun all support this.

I don’t understand why you wouldn’t want this behaviour. It seems pointless just to maintain symmetry with being unable to cast back to “<d”. And casting from e.g. floating point to bytes to integers already disregards the original data type, so casting from unsupported types to bytes should be no worse.
msg248182 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-08-07 09:00
The proposal sounds reasonable to me.
msg248186 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-08-07 12:57
If people are content with writing m[124:128] = b'abcd' and accept
that tolist() etc. won't represent the original structure of the
object, then let's do it.

On the bright side, it is less work. -- I'll review the patch.
msg248189 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-08-07 13:12
Le 07/08/2015 14:57, Stefan Krah a écrit :
> 
> If people are content with writing m[124:128] = b'abcd' and accept
> that tolist() etc. won't represent the original structure of the
> object, then let's do it.

As long as the casting has to be explicit, this sounds ok to me.
msg248190 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-08-07 13:15
Ok, shall we sneak this past Larry for 3.5?
msg248191 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-08-07 13:22
Why not :)
msg248261 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-08-08 11:39
New changeset e33f2b8b937f by Stefan Krah in branch '3.5':
Issue #15944: memoryview: Allow arbitrary formats when casting to bytes.
https://hg.python.org/cpython/rev/e33f2b8b937f

New changeset c7c4b8411037 by Stefan Krah in branch 'default':
Merge #15944.
https://hg.python.org/cpython/rev/c7c4b8411037
msg248262 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-08-08 11:45
Done.  Thanks for the patch.
History
Date User Action Args
2015-08-08 11:45:15skrahsetstatus: open -> closed
versions: + Python 3.5
messages: + msg248262

components: + Interpreter Core
resolution: fixed
stage: patch review -> resolved
2015-08-08 11:39:43python-devsetnosy: + python-dev
messages: + msg248261
2015-08-07 13:22:12pitrousetmessages: + msg248191
2015-08-07 13:15:49skrahsetmessages: + msg248190
2015-08-07 13:12:37pitrousetmessages: + msg248189
title: memoryview: allow all casts to bytes -> memoryviews and ctypes
2015-08-07 13:09:27skrahsettitle: memoryviews and ctypes -> memoryview: allow all casts to bytes
2015-08-07 12:57:56skrahsetmessages: + msg248186
2015-08-07 09:00:24pitrousetmessages: + msg248182
2015-08-07 01:55:16martin.pantersetmessages: + msg248164
2015-08-06 17:04:45skrahsetassignee: skrah
messages: + msg248135
2015-08-06 16:50:23martin.pantersetfiles: + cast-bytes.patch
versions: + Python 3.6, - Python 3.5
messages: + msg248134

keywords: + patch
stage: patch review
2015-08-06 13:45:34skrahsetmessages: + msg248127
2015-08-06 06:28:48eryksunsetnosy: + eryksun
messages: + msg248111
2015-08-06 00:44:05martin.pantersetmessages: + msg248089
2015-08-05 13:19:39cblpsetnosy: + cblp
messages: + msg248037
2014-10-18 04:58:33martin.pantersetnosy: + martin.panter
2014-10-17 02:07:14josh.rsetnosy: + josh.r
2014-10-16 21:46:31skrahsetversions: + Python 3.5, - Python 3.3
2014-10-16 21:46:19skrahsetmessages: + msg229560
2013-03-23 13:07:43skrahlinkissue16204 superseder
2012-09-20 15:49:59dabeazsetmessages: + msg170819
2012-09-20 15:42:42dabeazsetmessages: + msg170818
2012-09-20 09:23:19skrahsetmessages: + msg170795
2012-09-14 18:11:46skrahsetmessages: + msg170496
2012-09-14 17:30:49dabeazsetmessages: + msg170494
2012-09-14 17:08:57dabeazsetmessages: + msg170492
2012-09-14 17:00:06dabeazsetmessages: + msg170490
2012-09-14 16:53:20skrahsetmessages: + msg170489
2012-09-14 16:47:05dabeazsetmessages: + msg170488
2012-09-14 16:42:02skrahsetmessages: + msg170487
2012-09-14 16:19:16dabeazsetmessages: + msg170484
2012-09-14 16:08:09skrahsetmessages: + msg170483
2012-09-14 16:01:23skrahsetmessages: + msg170482
2012-09-14 15:43:06dabeazsetmessages: + msg170481
2012-09-14 15:37:41pitrousetnosy: + skrah, pitrou
messages: + msg170480
2012-09-14 15:05:29dabeazcreate