Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctypes: memoryview gives incorrect PEP3118 format strings for both packed and unpacked structs #76961

Closed
eric-wieser mannequin opened this issue Feb 6, 2018 · 16 comments · Fixed by #5561
Closed

ctypes: memoryview gives incorrect PEP3118 format strings for both packed and unpacked structs #76961

eric-wieser mannequin opened this issue Feb 6, 2018 · 16 comments · Fixed by #5561
Assignees
Labels
3.7 (EOL) end of life 3.8 only security fixes topic-ctypes type-bug An unexpected behavior, bug, or error

Comments

@eric-wieser
Copy link
Mannequin

eric-wieser mannequin commented Feb 6, 2018

BPO 32780
Nosy @terryjreedy, @vsajip, @amauryfa, @abalkin, @aerojockey, @skrah, @meadori, @mattip, @eric-wieser
PRs
  • gh-76961: Fix the PEP3118 format string for ctypes.Structure #5561
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/abalkin'
    closed_at = None
    created_at = <Date 2018-02-06.05:27:59.358>
    labels = ['3.8', 'ctypes', 'type-bug', '3.7']
    title = 'ctypes: memoryview gives incorrect PEP3118 format strings for both packed and unpacked structs'
    updated_at = <Date 2019-12-23.17:24:49.037>
    user = 'https://github.com/eric-wieser'

    bugs.python.org fields:

    activity = <Date 2019-12-23.17:24:49.037>
    actor = 'ned.deily'
    assignee = 'belopolsky'
    closed = False
    closed_date = None
    closer = None
    components = ['ctypes']
    creation = <Date 2018-02-06.05:27:59.358>
    creator = 'Eric.Wieser'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 32780
    keywords = ['patch']
    message_count = 8.0
    messages = ['311705', '321279', '321280', '321730', '322225', '340234', '340603', '358806']
    nosy_count = 10.0
    nosy_names = ['terry.reedy', 'teoliphant', 'vinay.sajip', 'amaury.forgeotdarc', 'belopolsky', 'aerojockey', 'skrah', 'meador.inge', 'mattip', 'Eric.Wieser']
    pr_nums = ['5561']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue32780'
    versions = ['Python 2.7', 'Python 3.7', 'Python 3.8']

    Linked PRs

    @eric-wieser
    Copy link
    Mannequin Author

    eric-wieser mannequin commented Feb 6, 2018

    Discovered here

    Consider the following structure, and a memoryview created from it:

        class foo(ctypes.Structure):
            _fields_ = [('one', ctypes.c_uint8),
                        ('two', ctypes.c_uint32)]
        f = foo()
        mf = memoryview(f)

    We'd expect this to insert padding, and it does:

    >>> mf.itemsize
    8

    But that padding doesn't show up in the format string:

    >>> mf.format
    'T{<B:one:<I:two:}'

    That format string describes the packed version of the struct, with the two field starting at offset 1, based on the struct documentation on how < should be interpreted:

    No padding is added when using non-native size and alignment, e.g. with ‘<’, ‘>’

    But ctypes doesn't even get it right for packed structs:

        class foop(ctypes.Structure):
            _fields_ = [('one', ctypes.c_uint8),
                        ('two', ctypes.c_uint32)]
            _pack_ = 1
        f = foo()
        mf = memoryview(f)

    The size is what we'd expect:

    >>> mf.itemsize
    5

    But the format is garbage:

    >>> mf.format
    'B'  # sizeof(byte) == 5!?

    @eric-wieser eric-wieser mannequin added 3.8 only security fixes 3.7 (EOL) end of life topic-ctypes type-bug An unexpected behavior, bug, or error labels Feb 6, 2018
    @eric-wieser
    Copy link
    Mannequin Author

    eric-wieser mannequin commented Jul 8, 2018

    Pinging, as recommended by https://devguide.python.org/pullrequest/#reviewing. PEP-3118 as a protocol is far less useful if the canonical implementation is non-compliant.

    @abalkin
    Copy link
    Member

    abalkin commented Jul 8, 2018

    Unfortunately, the PEP authors did very little in terms of implementing the PEP and neither CPython nor numpy has a fully compliant implementation.

    @abalkin abalkin self-assigned this Jul 8, 2018
    @aerojockey
    Copy link
    Mannequin

    aerojockey mannequin commented Jul 16, 2018

    I guess I'll weigh in since I was pinged.

    I agree with the approach in the patch. All the memoryview does is to use the format field verbatim from the underlying buffer, so if the format field is inaccurate then the only thing to do is to fix the object providing the buffer.

    Since the format field is only there for interpretation of the data, and is not used to calculate of itemsizes or strides anywhere as far as I know, it's a fairly low-risk change.

    However, the patch still leaves ctypes inaccurate for the case of unions. It should be fairly simple to modify the code to use a format of "B<size>" for unions, so that it at least matches the itemsize, even if the type information is lost.

    (As an aside, let me point out that I did not actually write or advocate for the PEP; for some reason my name was added to it even though I all I did was to provide feedback.)

    @eric-wieser
    Copy link
    Mannequin Author

    eric-wieser mannequin commented Jul 23, 2018

    It should be fairly simple to modify the code to use a format of "B<size>" for unions, so that it at least matches the itemsize

    Seems reasonable, although:

    • I think it should be "<size>B" or "(<size>)B"
    • I'd rather leave that for a later patch. While it would be correct, it's still not correct enough to be that useful, since ultimately the union layout is still lost. I'd prefer to focus on fixing the part of the PEPE3118 implementation that is most useful, rather than committing to fixing the whole thing all at once.

    @terryjreedy
    Copy link
    Member

    Fixing one case is better than fixing no cases.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 21, 2019

    Since Terry added me: Yes, this is clearly a bug, but it is a ctypes issue and not a memoryview issue.

    ctypes issues unfortunately tend to take some time until someone reviews.

    @mattip
    Copy link
    Contributor

    mattip commented Dec 23, 2019

    Is there a ctypes or PEP-3118 core-dev who could review the issue and the PR solution?

    @eric-wieser
    Copy link
    Contributor

    eric-wieser commented Jul 7, 2022

    @vsajip, perhaps you're qualified to review this (or rather the PR at #5561) given your work in #16589?

    @vsajip
    Copy link
    Member

    vsajip commented Jul 8, 2022

    I don't think I'm especially qualified to review this, unfortunately. I did work on some ctypes issues in the past, but they've required altogether too much time than I have available at the moment 😞 Perhaps one of the other committers who works closer to the C code level would be more appropriate (most of my work is at the Python stdlib level, though I can program in C).

    @eric-wieser
    Copy link
    Contributor

    Thanks for the response.

    Do you have any specific committers in mind?

    I guess I could ping Travis, the other PEP author; but what #5561 needs is a review from someone with commit bit.

    @vsajip
    Copy link
    Member

    vsajip commented Jul 8, 2022

    Sadly, ctypes-related expertise is thin on the ground, even among committers. I've had to revert a change that I made which I thought would improve things, but didn't. One problem we all have is that any fix has to work across many architectures, and most of us don't have access to those architectures to check fixes, diagnose problems etc. You could maybe use git blame to throw up some possibilities.

    @eric-wieser
    Copy link
    Contributor

    eric-wieser commented Jul 8, 2022

    Sadly, ctypes-related expertise is thin on the ground, even among committers.

    Yes, I've noticed that too. PEP3118 expertise seems even more sparse, with my impression being that numpy developers are more familiar with it that CPython ones (#5561 is both submitted and approved by a numpy maintainer). Is there anything I could do to this issue or PR to make it more accessible to reviewers who are familiar with the C code but not with ctypes?

    One problem we all have is that any fix has to work across many architectures

    My claim is that this particular fix shouldn't change any architecture-specific behavior; the logic for laying out the ctypes fields in memory hasn't changed; all that's changed is the PEP3118 reporting of the layout that was used.
    The existing reporting behavior is wrong for any struct with any padding on every architecture, so no regression regarding PEP3118 should be possible.

    @vsajip
    Copy link
    Member

    vsajip commented Jul 8, 2022

    Sure. I was only giving my perceived reason as to why core developers don't have familiarity with ctypes internals - it seems to be outside most people's comfort zone 😞

    @abalkin
    Copy link
    Member

    abalkin commented Aug 14, 2022

    @eric-wieser - it looks like I dropped the ball on this quite some time ago. I will see it through this time. Feel free to bug me here if I don't. :)

    @eric-wieser
    Copy link
    Contributor

    @abalkin 🐛

    mdickinson pushed a commit that referenced this issue Feb 5, 2023
    The summary of this diff is that it:
    
    * adds a `_ctypes_alloc_format_padding` function to append strings like `37x` to a format string to indicate 37 padding bytes
    * removes the branches that amount to "give up on producing a valid format string if the struct is packed"
    * combines the resulting adjacent `if (isStruct) {`s now that neither is `if (isStruct && !isPacked) {`
    * invokes `_ctypes_alloc_format_padding` to add padding between structure fields, and after the last structure field. The computation used for the total size is unchanged from ctypes already used.
    
    This patch does not affect any existing aligment computation; all it does is use subtraction to deduce the amount of paddnig introduced by the existing code.
    
    ---
    
    Without this fix, it would never include padding bytes - an assumption that was only
    valid in the case when `_pack_` was set - and this case was explicitly not implemented.
    
    This should allow conversion from ctypes structs to numpy structs
    
    Fixes numpy/numpy#10528
    mdickinson added a commit that referenced this issue Feb 6, 2023
    This PR fixes the buildbot failures introduced by the merge of #5561, by restricting the relevant tests to something that should work on both 32-bit and 64-bit platforms. It also silences some compiler warnings introduced in that PR.
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes topic-ctypes type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    Successfully merging a pull request may close this issue.

    5 participants