memoryviews and ctypes #60148

dabeaz · 2012-09-14T15:05:30Z

BPO	15944
Nosy	@pitrou, @skrah, @vadmium, @eryksun, @MojoVampire
Files	cast-bytes.patch: Universal cast to bytes

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/skrah'
closed_at = <Date 2015-08-08.11:45:15.291>
created_at = <Date 2012-09-14.15:05:29.975>
labels = ['interpreter-core', 'type-bug']
title = 'memoryviews and ctypes'
updated_at = <Date 2015-08-08.11:45:15.195>
user = 'https://bugs.python.org/dabeaz'

bugs.python.org fields:

activity = <Date 2015-08-08.11:45:15.195>
actor = 'skrah'
assignee = 'skrah'
closed = True
closed_date = <Date 2015-08-08.11:45:15.291>
closer = 'skrah'
components = ['Interpreter Core']
creation = <Date 2012-09-14.15:05:29.975>
creator = 'dabeaz'
dependencies = []
files = ['40139']
hgrepos = []
issue_num = 15944
keywords = ['patch']
message_count = 31.0
messages = ['170477', '170480', '170481', '170482', '170483', '170484', '170487', '170488', '170489', '170490', '170492', '170494', '170496', '170795', '170818', '170819', '229560', '248037', '248089', '248111', '248127', '248134', '248135', '248164', '248182', '248186', '248189', '248190', '248191', '248261', '248262']
nosy_count = 8.0
nosy_names = ['pitrou', 'skrah', 'dabeaz', 'python-dev', 'martin.panter', 'cblp', 'eryksun', 'josh.r']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue15944'
versions = ['Python 3.5', 'Python 3.6']

dabeaz · 2012-09-14T15:05:29Z

I've been playing with the interaction of ctypes and memoryviews and am curious about intended behavior. Consider the following:

>>> import ctypes
>>> d = ctypes.c_double()
>>> m = memoryview(d)
>>> m.ndim
0
>>> m.shape
()
>>> m.readonly
False
>>> m.itemsize
8
>>>

As you can see, you have a memory view for the ctypes double object. However, the fact that it has a 0-dimension and no shape seems to cause all sorts of weird behavior. For instance, indexing and slicing don't work:

>>> m[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: invalid indexing of 0-dim memory
>>> m[:]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: invalid indexing of 0-dim memory
>>>

As such, you can't really seem to do anything interesting with the resulting memory view. For example, you can't pull data out of it. Nor can you overwrite the contents (i.e., replacing the contents with an 8-byte byte string).

Attempting to cast the memory view to something else doesn't work either.

>>> d = ctypes.c_double()
>>> m = memoryview(d)
>>> m2 = m.cast('c')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: memoryview: source format must be a native single character format prefixed with an optional '@'
>>>

I must be missing something really obvious here. Is there no way to get access to the memory behind a ctypes object?

pitrou · 2012-09-14T15:37:42Z

You can still read the underlying representation:

>>> d = ctypes.c_double(0.6)
>>> m = memoryview(d)
>>> bytes(m)
b'333333\xe3?'
>>> d.value = 0.7
>>> bytes(m)
b'ffffff\xe6?'

dabeaz · 2012-09-14T15:43:06Z

I don't want to read the representation by copying it into a bytes object. I want direct access to the underlying memory--including the ability to modify it. As it stands now, it's completely useless.

skrah · 2012-09-14T16:01:23Z

0-dim memory is indexed by x[()]. The ctypes example has an additional
problem, because format="<d" is not yet implemented in memoryview.

Only native single character formats in struct module syntax are
implemented, and "<d" in struct module syntax means "standard size,
little endian".

To demonstrate 0-dim indexing, here's an example using _testbuffer:

>>> x = ndarray(3.14, shape=[], format='d', flags=ND_WRITABLE)
>>> x[()]
3.14
>>> tau = 6.28
>>> x[()] = tau
>>> x[()]
6.28
>>> m = memoryview(x)
>>> m[()]
6.28
>>> m[()] = 100.111
>>> m[()]
100.111

skrah · 2012-09-14T16:08:09Z

BTW, if c_double means "native machine double", then ctypes should
fill in Py_buffer.format with "d" and not "<d" in order to be PEP-3118
compatible.

dabeaz · 2012-09-14T16:19:17Z

Even with the <d format, I'm not sure why it can't be cast to simple byte-view. None of that seems to work at all.

skrah · 2012-09-14T16:42:02Z

The decision was made in order to be able to cast back and forth between
known formats. Otherwise one would be able to cast from '<d' to 'B'
but not from 'B' to '<d'.

Python 3.4 will have support for all formats in struct module syntax,
but all non-native formats will be *far* slower than the native ones.

You can still pack/unpack directly using the struct module:

>>> import ctypes, struct
>>> d = ctypes.c_double()
>>> m = memoryview(d)
>>> struct.pack_into(m.format, m, 0, 22.7)
>>> struct.unpack_from(m.format, m, 0)[0]
22.7

dabeaz · 2012-09-14T16:47:05Z

I don't think memoryviews should be imposing any casting restrictions at all. It's low level. Get out of the way.

skrah · 2012-09-14T16:53:20Z

So you want to be able to segfault the core interpreter using the
builtins?

dabeaz · 2012-09-14T17:00:06Z

No, I want to be able to access the raw bytes sitting behind a memoryview as bytes without all of this casting and reinterpretation. Just show me the raw bytes. Not doubles, not ints, not structure packing, not copying into byte strings, or whatever. Is this really impossible? It sure seems so.

dabeaz · 2012-09-14T17:08:58Z

Just to be specific, why is something like this not possible?

>>> d = ctypes.c_double()
>>> m = memoryview(d)
>>> m[0:8] = b'abcdefgh'
>>> d.value
8.540883223036124e+194
>>>

(Doesn't have to be exactly like this, but what's wrong with overwriting bytes with bytes of a compatible size?).

dabeaz · 2012-09-14T17:30:49Z

I should add that 0-dim indexing doesn't work as described either:

>>> import ctypes
>>> d = ctypes.c_double()
>>> m = memoryview(d)
>>> m[()]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError: memoryview: unsupported format <d
>>>

skrah · 2012-09-14T18:11:46Z

Please read msg170482. It even contains a copy and paste example!

skrah · 2012-09-20T09:23:19Z

As I understand it, you prefer memoryviews where the format is
purely informational, whereas we now have typed memoryviews.

Typed memoryviews are certainly useful, in fact they are
present in Cython, see here for examples:

http://docs.cython.org/src/userguide/memoryviews.html

I can see only one obvious benefit of ignoring the format: All possible
formats are accepted. What I don't understand is why this ...

m[0] = b'\x00\x00\x00\x01'

... should be preferable to:

m[0] = 1

If you think that typed memoryviews are a mistake, I suggest raising
the issue on python-dev as soon as possible (3.3 is due soon). All
memoryview operations are now based on values instead of bit patterns,
see for example bpo-15573.

dabeaz · 2012-09-20T15:42:42Z

There's probably a bigger discussion about memoryviews for a rainy day. However, the number one thing that would save all of this in my book would be to make sure cast('B') is universally supported regardless of format including endianness--especially in the standard library. For example, being able to do this:

>>> a = array.array('d',[1.0, 2.0, 3.0, 4.0])
>>> m = memoryview(a).cast('B')
>>> m[0:4] = b'\x00\x01\x02\x03'
>>> a
array('d', [1.0000000112050316, 2.0, 3.0, 4.0])
>>>

Right now, it doesn't work for ctypes. For example:

>>> import ctypes
>>> a = (ctypes.c_double * 4)(1,2,3,4)
>>> a
<__main__.c_double_Array_4 object at 0x1006a7cb0>
>>> m = memoryview(a).cast('B')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: memoryview: source format must be a native single character format prefixed with an optional '@'
>>>

As some background, being able to work with a "byte" view of memory is important for a lot of problems involving I/O, data interchange, and related problems where being able to accurately construct/deconstruct the underlying memory buffers is more useful than actually interpreting their contents.

dabeaz · 2012-09-20T15:50:00Z

One followup note---I think it's fine to punt on cast('B') if the memoryview is non-contiguous. That's a rare case that's probably not as common.

skrah · 2014-10-16T21:46:19Z

We could add a flag memoryview(x, raw=True) to the constructor. This view
would behave exactly like the regular one except that it ignores buf.format
entirely.

So you could do assignments like:

m[10] = b'\x00\x00\x00\x01'

This would be more flexible in general since memoryview currently only supports
native struct formats (complex formats slow down certain operations dramatically).

I think the feature would not add much additional complexity to the code.

The question is: Is this a general need? Are many people are using memoryviews
for bit-twiddling?

cblp · 2015-08-05T13:19:37Z

You don't need raw=True, .cast('b') already must do this. But unfortunately, is is not implemented yet.

vadmium · 2015-08-06T00:44:05Z

In my experience, I tend to only use memoryview() for “bytes-like” buffers (but see bpo-23756 about clarifying what this means). Example from /Lib/_compression.py:67:

def readinto(self, b):
    with memoryview(b) as view, view.cast("B") as byte_view:
        data = self.read(len(byte_view))
        byte_view[:len(data)] = data
    return len(data)

Fixing cast("B") or adding a memoryview(raw=True) mode could probably help when all you want is a byte buffer.

eryksun · 2015-08-06T06:28:47Z

A functional memoryview for ctypes objects would avoid having to use workarounds, such as the following:

    >>> d = ctypes.c_double()
    >>> b = (ctypes.c_char * ctypes.sizeof(d)).from_buffer(d)
    >>> b[:] = b'abcdefgh'
    >>> d.value
    8.540883223036124e+194

or using numpy.frombuffer as a bridge:

    >>> d = ctypes.c_double()
    >>> m = memoryview(numpy.frombuffer(d, 'B'))
    >>> m[:] = b'abcdefgh'
    >>> d.value
    8.540883223036124e+194

David's request that cast('B') should be made to work for all contiguous buffers seems reasonable. That said, the ctypes format strings also need fixing. Let's see what happens when "@d" is used instead of "<d":

    >>> double_stgdict = stgdict(ctypes.c_double)
    >>> double_stgdict
    dict: 
        ob_base: 
            ob_refcnt: 1
            ob_type: py_object(<class 'StgDict'>)
        ma_used: 7
        ma_keys: LP_PyDictKeysObject(0x1aa5750)
        ma_values: LP_LP_PyObject(<NULL>)
    size: 8
    align: 8
    length: 0
    ffi_type_pointer: 
        size: 8
        alignment: 8
        type: 3
        elements: <NULL>
    proto: py_object('d')
    setfunc: SETFUNC(0x7f9f9b6e3e60)
    getfunc: GETFUNC(0x7f9f9b6e3d90)
    paramfunc: PARAMFUNC(0x7f9f9b6e31d0)
    argtypes: py_object(<NULL>)
    converters: py_object(<NULL>)
    restype: py_object(<NULL>)
    checker: py_object(<NULL>)
    flags: 4096
    format: b'<d'
    ndim: 0
    shape: LP_c_long(<NULL>)

>>> double_stgdict.format = b'@d'

    >>> d = ctypes.c_double(3.14)
    >>> m = memoryview(d)
    >>> m[()]
    3.14
    >>> m[()] = 6.28
    >>> d.value
    6.28

    >>> m = m.cast('B')
    >>> m[:] = b'abcdefgh'
    >>> d.value
    8.540883223036124e+194

This shows that changing the format string (set by PyCSimpleType_new in _ctypes.c) to use "@" makes the memoryview work normally. OTOH, the swapped type (e.g. c_double.__ctype_be__) would need to continue to use a standard little-endian ("<") or big-endian (">") format.

skrah · 2015-08-06T13:45:32Z

Yuriy: cast() does not do this. What's requested is that e.g. a
single float is represented as a bytes object instead of a float.

Thus, you'd be able to do:

m[0] = b'\x00\x00\x00\x01'

This has other implications, for example, two NaNs would compare
equal. Hence the suggestion memoryview(raw=True).

vadmium · 2015-08-06T16:50:17Z

Here is a patch that allows any “C-contiguous” memoryview() to be cast to a byte view. Apart from the test that was explicitly checking that this wasn’t supported, the rest of the test suite still passes. I basically removed the check that was generating the “source format must be a native single character” error.

If two NANs are represented by the same byte sequence, I would expect their byte views to compare equal, which is the case with my patch.

skrah · 2015-08-06T17:04:45Z

The question is whether we want this behavior.

vadmium · 2015-08-07T01:55:15Z

Assuming bpo-23756 is resolved and various standard library functions are meant to work with any C-contiguous buffer, then it makes sense to me for memoryview.cast("B") to work for any C-contiguous buffer. I also got the impression that David, Yuriy, and Eryksun all support this.

I don’t understand why you wouldn’t want this behaviour. It seems pointless just to maintain symmetry with being unable to cast back to “<d”. And casting from e.g. floating point to bytes to integers already disregards the original data type, so casting from unsupported types to bytes should be no worse.

pitrou · 2015-08-07T09:00:25Z

The proposal sounds reasonable to me.

skrah · 2015-08-07T12:57:56Z

If people are content with writing m[124:128] = b'abcd' and accept
that tolist() etc. won't represent the original structure of the
object, then let's do it.

On the bright side, it is less work. -- I'll review the patch.

pitrou · 2015-08-07T13:12:37Z

Le 07/08/2015 14:57, Stefan Krah a écrit :

If people are content with writing m[124:128] = b'abcd' and accept
that tolist() etc. won't represent the original structure of the
object, then let's do it.

As long as the casting has to be explicit, this sounds ok to me.

skrah · 2015-08-07T13:15:49Z

Ok, shall we sneak this past Larry for 3.5?

pitrou · 2015-08-07T13:22:12Z

Why not :)

python-dev · 2015-08-08T11:39:43Z

New changeset e33f2b8b937f by Stefan Krah in branch '3.5':
Issue bpo-15944: memoryview: Allow arbitrary formats when casting to bytes.
https://hg.python.org/cpython/rev/e33f2b8b937f

New changeset c7c4b8411037 by Stefan Krah in branch 'default':
Merge bpo-15944.
https://hg.python.org/cpython/rev/c7c4b8411037

skrah · 2015-08-08T11:45:15Z

Done. Thanks for the patch.

dabeaz mannequin added the type-bug An unexpected behavior, bug, or error label Sep 14, 2012

skrah mannequin self-assigned this Aug 6, 2015

skrah mannequin changed the title ~~memoryviews and ctypes~~ memoryview: allow all casts to bytes Aug 7, 2015

pitrou changed the title ~~memoryview: allow all casts to bytes~~ memoryviews and ctypes Aug 7, 2015

skrah mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Aug 8, 2015

skrah mannequin closed this as completed Aug 8, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memoryviews and ctypes #60148

memoryviews and ctypes #60148

dabeaz mannequin commented Sep 14, 2012

dabeaz mannequin commented Sep 14, 2012

pitrou commented Sep 14, 2012

dabeaz mannequin commented Sep 14, 2012

skrah mannequin commented Sep 14, 2012

skrah mannequin commented Sep 14, 2012

dabeaz mannequin commented Sep 14, 2012

skrah mannequin commented Sep 14, 2012

dabeaz mannequin commented Sep 14, 2012

skrah mannequin commented Sep 14, 2012

dabeaz mannequin commented Sep 14, 2012

dabeaz mannequin commented Sep 14, 2012

dabeaz mannequin commented Sep 14, 2012

skrah mannequin commented Sep 14, 2012

skrah mannequin commented Sep 20, 2012

dabeaz mannequin commented Sep 20, 2012

dabeaz mannequin commented Sep 20, 2012

skrah mannequin commented Oct 16, 2014

cblp mannequin commented Aug 5, 2015

vadmium commented Aug 6, 2015

eryksun commented Aug 6, 2015

skrah mannequin commented Aug 6, 2015

vadmium commented Aug 6, 2015

skrah mannequin commented Aug 6, 2015

vadmium commented Aug 7, 2015

pitrou commented Aug 7, 2015

skrah mannequin commented Aug 7, 2015

pitrou commented Aug 7, 2015

skrah mannequin commented Aug 7, 2015

pitrou commented Aug 7, 2015

python-dev mannequin commented Aug 8, 2015

skrah mannequin commented Aug 8, 2015

memoryviews and ctypes #60148

memoryviews and ctypes #60148

Comments

dabeaz mannequin commented Sep 14, 2012

dabeaz mannequin commented Sep 14, 2012

pitrou commented Sep 14, 2012

dabeaz mannequin commented Sep 14, 2012

skrah mannequin commented Sep 14, 2012

skrah mannequin commented Sep 14, 2012

dabeaz mannequin commented Sep 14, 2012

skrah mannequin commented Sep 14, 2012

dabeaz mannequin commented Sep 14, 2012

skrah mannequin commented Sep 14, 2012

dabeaz mannequin commented Sep 14, 2012

dabeaz mannequin commented Sep 14, 2012

dabeaz mannequin commented Sep 14, 2012

skrah mannequin commented Sep 14, 2012

skrah mannequin commented Sep 20, 2012

dabeaz mannequin commented Sep 20, 2012

dabeaz mannequin commented Sep 20, 2012

skrah mannequin commented Oct 16, 2014

cblp mannequin commented Aug 5, 2015

vadmium commented Aug 6, 2015

eryksun commented Aug 6, 2015

skrah mannequin commented Aug 6, 2015

vadmium commented Aug 6, 2015

skrah mannequin commented Aug 6, 2015

vadmium commented Aug 7, 2015

pitrou commented Aug 7, 2015

skrah mannequin commented Aug 7, 2015

pitrou commented Aug 7, 2015

skrah mannequin commented Aug 7, 2015

pitrou commented Aug 7, 2015

python-dev mannequin commented Aug 8, 2015

skrah mannequin commented Aug 8, 2015