Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memoryviews and ctypes #60148

Closed
dabeaz mannequin opened this issue Sep 14, 2012 · 31 comments
Closed

memoryviews and ctypes #60148

dabeaz mannequin opened this issue Sep 14, 2012 · 31 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@dabeaz
Copy link
Mannequin

dabeaz mannequin commented Sep 14, 2012

BPO 15944
Nosy @pitrou, @skrah, @vadmium, @eryksun, @MojoVampire
Files
  • cast-bytes.patch: Universal cast to bytes
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/skrah'
    closed_at = <Date 2015-08-08.11:45:15.291>
    created_at = <Date 2012-09-14.15:05:29.975>
    labels = ['interpreter-core', 'type-bug']
    title = 'memoryviews and ctypes'
    updated_at = <Date 2015-08-08.11:45:15.195>
    user = 'https://bugs.python.org/dabeaz'

    bugs.python.org fields:

    activity = <Date 2015-08-08.11:45:15.195>
    actor = 'skrah'
    assignee = 'skrah'
    closed = True
    closed_date = <Date 2015-08-08.11:45:15.291>
    closer = 'skrah'
    components = ['Interpreter Core']
    creation = <Date 2012-09-14.15:05:29.975>
    creator = 'dabeaz'
    dependencies = []
    files = ['40139']
    hgrepos = []
    issue_num = 15944
    keywords = ['patch']
    message_count = 31.0
    messages = ['170477', '170480', '170481', '170482', '170483', '170484', '170487', '170488', '170489', '170490', '170492', '170494', '170496', '170795', '170818', '170819', '229560', '248037', '248089', '248111', '248127', '248134', '248135', '248164', '248182', '248186', '248189', '248190', '248191', '248261', '248262']
    nosy_count = 8.0
    nosy_names = ['pitrou', 'skrah', 'dabeaz', 'python-dev', 'martin.panter', 'cblp', 'eryksun', 'josh.r']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue15944'
    versions = ['Python 3.5', 'Python 3.6']

    @dabeaz
    Copy link
    Mannequin Author

    dabeaz mannequin commented Sep 14, 2012

    I've been playing with the interaction of ctypes and memoryviews and am curious about intended behavior. Consider the following:

    >>> import ctypes
    >>> d = ctypes.c_double()
    >>> m = memoryview(d)
    >>> m.ndim
    0
    >>> m.shape
    ()
    >>> m.readonly
    False
    >>> m.itemsize
    8
    >>>

    As you can see, you have a memory view for the ctypes double object. However, the fact that it has a 0-dimension and no shape seems to cause all sorts of weird behavior. For instance, indexing and slicing don't work:

    >>> m[0]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: invalid indexing of 0-dim memory
    >>> m[:]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: invalid indexing of 0-dim memory
    >>> 

    As such, you can't really seem to do anything interesting with the resulting memory view. For example, you can't pull data out of it. Nor can you overwrite the contents (i.e., replacing the contents with an 8-byte byte string).

    Attempting to cast the memory view to something else doesn't work either.

    >>> d = ctypes.c_double()
    >>> m = memoryview(d)
    >>> m2 = m.cast('c')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: memoryview: source format must be a native single character format prefixed with an optional '@'
    >>> 

    I must be missing something really obvious here. Is there no way to get access to the memory behind a ctypes object?

    @dabeaz dabeaz mannequin added the type-bug An unexpected behavior, bug, or error label Sep 14, 2012
    @pitrou
    Copy link
    Member

    pitrou commented Sep 14, 2012

    You can still read the underlying representation:

    >>> d = ctypes.c_double(0.6)
    >>> m = memoryview(d)
    >>> bytes(m)
    b'333333\xe3?'
    >>> d.value = 0.7
    >>> bytes(m)
    b'ffffff\xe6?'

    @dabeaz
    Copy link
    Mannequin Author

    dabeaz mannequin commented Sep 14, 2012

    I don't want to read the representation by copying it into a bytes object. I want direct access to the underlying memory--including the ability to modify it. As it stands now, it's completely useless.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 14, 2012

    0-dim memory is indexed by x[()]. The ctypes example has an additional
    problem, because format="<d" is not yet implemented in memoryview.

    Only native single character formats in struct module syntax are
    implemented, and "<d" in struct module syntax means "standard size,
    little endian".

    To demonstrate 0-dim indexing, here's an example using _testbuffer:

    >>> x = ndarray(3.14, shape=[], format='d', flags=ND_WRITABLE)
    >>> x[()]
    3.14
    >>> tau = 6.28
    >>> x[()] = tau
    >>> x[()]
    6.28
    >>> m = memoryview(x)
    >>> m[()]
    6.28
    >>> m[()] = 100.111
    >>> m[()]
    100.111

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 14, 2012

    BTW, if c_double means "native machine double", then ctypes should
    fill in Py_buffer.format with "d" and not "<d" in order to be PEP-3118
    compatible.

    @dabeaz
    Copy link
    Mannequin Author

    dabeaz mannequin commented Sep 14, 2012

    Even with the <d format, I'm not sure why it can't be cast to simple byte-view. None of that seems to work at all.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 14, 2012

    The decision was made in order to be able to cast back and forth between
    known formats. Otherwise one would be able to cast from '<d' to 'B'
    but not from 'B' to '<d'.

    Python 3.4 will have support for all formats in struct module syntax,
    but all non-native formats will be *far* slower than the native ones.

    You can still pack/unpack directly using the struct module:

    >>> import ctypes, struct
    >>> d = ctypes.c_double()
    >>> m = memoryview(d)
    >>> struct.pack_into(m.format, m, 0, 22.7)
    >>> struct.unpack_from(m.format, m, 0)[0]
    22.7

    @dabeaz
    Copy link
    Mannequin Author

    dabeaz mannequin commented Sep 14, 2012

    I don't think memoryviews should be imposing any casting restrictions at all. It's low level. Get out of the way.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 14, 2012

    So you want to be able to segfault the core interpreter using the
    builtins?

    @dabeaz
    Copy link
    Mannequin Author

    dabeaz mannequin commented Sep 14, 2012

    No, I want to be able to access the raw bytes sitting behind a memoryview as bytes without all of this casting and reinterpretation. Just show me the raw bytes. Not doubles, not ints, not structure packing, not copying into byte strings, or whatever. Is this really impossible? It sure seems so.

    @dabeaz
    Copy link
    Mannequin Author

    dabeaz mannequin commented Sep 14, 2012

    Just to be specific, why is something like this not possible?

    >>> d = ctypes.c_double()
    >>> m = memoryview(d)
    >>> m[0:8] = b'abcdefgh'
    >>> d.value
    8.540883223036124e+194
    >>>

    (Doesn't have to be exactly like this, but what's wrong with overwriting bytes with bytes of a compatible size?).

    @dabeaz
    Copy link
    Mannequin Author

    dabeaz mannequin commented Sep 14, 2012

    I should add that 0-dim indexing doesn't work as described either:

    >>> import ctypes
    >>> d = ctypes.c_double()
    >>> m = memoryview(d)
    >>> m[()]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NotImplementedError: memoryview: unsupported format <d
    >>>

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 14, 2012

    Please read msg170482. It even contains a copy and paste example!

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Sep 20, 2012

    As I understand it, you prefer memoryviews where the format is
    purely informational, whereas we now have typed memoryviews.

    Typed memoryviews are certainly useful, in fact they are
    present in Cython, see here for examples:

    http://docs.cython.org/src/userguide/memoryviews.html

    I can see only one obvious benefit of ignoring the format: All possible
    formats are accepted. What I don't understand is why this ...

    m[0] = b'\x00\x00\x00\x01'

    ... should be preferable to:

    m[0] = 1

    If you think that typed memoryviews are a mistake, I suggest raising
    the issue on python-dev as soon as possible (3.3 is due soon). All
    memoryview operations are now based on values instead of bit patterns,
    see for example bpo-15573.

    @dabeaz
    Copy link
    Mannequin Author

    dabeaz mannequin commented Sep 20, 2012

    There's probably a bigger discussion about memoryviews for a rainy day. However, the number one thing that would save all of this in my book would be to make sure cast('B') is universally supported regardless of format including endianness--especially in the standard library. For example, being able to do this:

    >>> a = array.array('d',[1.0, 2.0, 3.0, 4.0])
    >>> m = memoryview(a).cast('B')
    >>> m[0:4] = b'\x00\x01\x02\x03'
    >>> a
    array('d', [1.0000000112050316, 2.0, 3.0, 4.0])
    >>> 

    Right now, it doesn't work for ctypes. For example:

    >>> import ctypes
    >>> a = (ctypes.c_double * 4)(1,2,3,4)
    >>> a
    <__main__.c_double_Array_4 object at 0x1006a7cb0>
    >>> m = memoryview(a).cast('B')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: memoryview: source format must be a native single character format prefixed with an optional '@'
    >>> 

    As some background, being able to work with a "byte" view of memory is important for a lot of problems involving I/O, data interchange, and related problems where being able to accurately construct/deconstruct the underlying memory buffers is more useful than actually interpreting their contents.

    @dabeaz
    Copy link
    Mannequin Author

    dabeaz mannequin commented Sep 20, 2012

    One followup note---I think it's fine to punt on cast('B') if the memoryview is non-contiguous. That's a rare case that's probably not as common.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Oct 16, 2014

    We could add a flag memoryview(x, raw=True) to the constructor. This view
    would behave exactly like the regular one except that it ignores buf.format
    entirely.

    So you could do assignments like:

    m[10] = b'\x00\x00\x00\x01'

    This would be more flexible in general since memoryview currently only supports
    native struct formats (complex formats slow down certain operations dramatically).

    I think the feature would not add much additional complexity to the code.

    The question is: Is this a general need? Are many people are using memoryviews
    for bit-twiddling?

    @cblp
    Copy link
    Mannequin

    cblp mannequin commented Aug 5, 2015

    You don't need raw=True, .cast('b') already must do this. But unfortunately, is is not implemented yet.

    @vadmium
    Copy link
    Member

    vadmium commented Aug 6, 2015

    In my experience, I tend to only use memoryview() for “bytes-like” buffers (but see bpo-23756 about clarifying what this means). Example from /Lib/_compression.py:67:

    def readinto(self, b):
        with memoryview(b) as view, view.cast("B") as byte_view:
            data = self.read(len(byte_view))
            byte_view[:len(data)] = data
        return len(data)

    Fixing cast("B") or adding a memoryview(raw=True) mode could probably help when all you want is a byte buffer.

    @eryksun
    Copy link
    Contributor

    eryksun commented Aug 6, 2015

    A functional memoryview for ctypes objects would avoid having to use workarounds, such as the following:

        >>> d = ctypes.c_double()
        >>> b = (ctypes.c_char * ctypes.sizeof(d)).from_buffer(d)
        >>> b[:] = b'abcdefgh'
        >>> d.value
        8.540883223036124e+194

    or using numpy.frombuffer as a bridge:

        >>> d = ctypes.c_double()
        >>> m = memoryview(numpy.frombuffer(d, 'B'))
        >>> m[:] = b'abcdefgh'
        >>> d.value
        8.540883223036124e+194

    David's request that cast('B') should be made to work for all contiguous buffers seems reasonable. That said, the ctypes format strings also need fixing. Let's see what happens when "@d" is used instead of "<d":

        >>> double_stgdict = stgdict(ctypes.c_double)
        >>> double_stgdict
        dict: 
            ob_base: 
                ob_refcnt: 1
                ob_type: py_object(<class 'StgDict'>)
            ma_used: 7
            ma_keys: LP_PyDictKeysObject(0x1aa5750)
            ma_values: LP_LP_PyObject(<NULL>)
        size: 8
        align: 8
        length: 0
        ffi_type_pointer: 
            size: 8
            alignment: 8
            type: 3
            elements: <NULL>
        proto: py_object('d')
        setfunc: SETFUNC(0x7f9f9b6e3e60)
        getfunc: GETFUNC(0x7f9f9b6e3d90)
        paramfunc: PARAMFUNC(0x7f9f9b6e31d0)
        argtypes: py_object(<NULL>)
        converters: py_object(<NULL>)
        restype: py_object(<NULL>)
        checker: py_object(<NULL>)
        flags: 4096
        format: b'<d'
        ndim: 0
        shape: LP_c_long(<NULL>)
    >>> double_stgdict.format = b'@d'
    
        >>> d = ctypes.c_double(3.14)
        >>> m = memoryview(d)
        >>> m[()]
        3.14
        >>> m[()] = 6.28
        >>> d.value
        6.28
    
        >>> m = m.cast('B')
        >>> m[:] = b'abcdefgh'
        >>> d.value
        8.540883223036124e+194

    This shows that changing the format string (set by PyCSimpleType_new in _ctypes.c) to use "@" makes the memoryview work normally. OTOH, the swapped type (e.g. c_double.__ctype_be__) would need to continue to use a standard little-endian ("<") or big-endian (">") format.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Aug 6, 2015

    Yuriy: cast() does not do this. What's requested is that e.g. a
    single float is represented as a bytes object instead of a float.

    Thus, you'd be able to do:

    m[0] = b'\x00\x00\x00\x01'

    This has other implications, for example, two NaNs would compare
    equal. Hence the suggestion memoryview(raw=True).

    @vadmium
    Copy link
    Member

    vadmium commented Aug 6, 2015

    Here is a patch that allows any “C-contiguous” memoryview() to be cast to a byte view. Apart from the test that was explicitly checking that this wasn’t supported, the rest of the test suite still passes. I basically removed the check that was generating the “source format must be a native single character” error.

    If two NANs are represented by the same byte sequence, I would expect their byte views to compare equal, which is the case with my patch.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Aug 6, 2015

    The question is whether we want this behavior.

    @skrah skrah mannequin self-assigned this Aug 6, 2015
    @vadmium
    Copy link
    Member

    vadmium commented Aug 7, 2015

    Assuming bpo-23756 is resolved and various standard library functions are meant to work with any C-contiguous buffer, then it makes sense to me for memoryview.cast("B") to work for any C-contiguous buffer. I also got the impression that David, Yuriy, and Eryksun all support this.

    I don’t understand why you wouldn’t want this behaviour. It seems pointless just to maintain symmetry with being unable to cast back to “<d”. And casting from e.g. floating point to bytes to integers already disregards the original data type, so casting from unsupported types to bytes should be no worse.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 7, 2015

    The proposal sounds reasonable to me.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Aug 7, 2015

    If people are content with writing m[124:128] = b'abcd' and accept
    that tolist() etc. won't represent the original structure of the
    object, then let's do it.

    On the bright side, it is less work. -- I'll review the patch.

    @skrah skrah mannequin changed the title memoryviews and ctypes memoryview: allow all casts to bytes Aug 7, 2015
    @pitrou
    Copy link
    Member

    pitrou commented Aug 7, 2015

    Le 07/08/2015 14:57, Stefan Krah a écrit :

    If people are content with writing m[124:128] = b'abcd' and accept
    that tolist() etc. won't represent the original structure of the
    object, then let's do it.

    As long as the casting has to be explicit, this sounds ok to me.

    @pitrou pitrou changed the title memoryview: allow all casts to bytes memoryviews and ctypes Aug 7, 2015
    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Aug 7, 2015

    Ok, shall we sneak this past Larry for 3.5?

    @pitrou
    Copy link
    Member

    pitrou commented Aug 7, 2015

    Why not :)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 8, 2015

    New changeset e33f2b8b937f by Stefan Krah in branch '3.5':
    Issue bpo-15944: memoryview: Allow arbitrary formats when casting to bytes.
    https://hg.python.org/cpython/rev/e33f2b8b937f

    New changeset c7c4b8411037 by Stefan Krah in branch 'default':
    Merge bpo-15944.
    https://hg.python.org/cpython/rev/c7c4b8411037

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Aug 8, 2015

    Done. Thanks for the patch.

    @skrah skrah mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Aug 8, 2015
    @skrah skrah mannequin closed this as completed Aug 8, 2015
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants