classification
Title: usage of bitfields in ctypes structures changed between 3.7.5 and 3.7.6
Type: behavior Stage: resolved
Components: ctypes Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Matthew Newville, cheryl.sabella, eryksun, vinay.sajip
Priority: normal Keywords: 3.7regression, 3.8regression

Created on 2020-01-10 22:16 by Matthew Newville, last changed 2020-01-23 05:47 by vinay.sajip. This issue is now closed.

Messages (9)
msg359763 - (view) Author: Matthew Newville (Matthew Newville) Date: 2020-01-10 22:16
We have a library (https://github.com/pyepics/pyepics) that wraps several C structures for a communication protocol library that involves many C->Python callbacks.  One of the simpler structures we wrap with ctypes is defined with

typedef struct ca_access_rights {
    unsigned    read_access:1;
    unsigned    write_access:1; } caar;

struct  access_rights_handler_args {
    long  chanId; /* channel id */
    caar  access; /* access rights state */
};

which we had wrapped (perhaps naively) as 

class access_rights_handler_args(ctypes.Structure):
    "access rights arguments"
    _fields_ = [('chid', ctypes.c_long),
                ('read_access', ctypes.c_uint, 1),
                ('write_access', ctypes.c_uint, 1)]

which we would then this structure as the function argument of a callback function that the underlying library would call, using

    _Callback = ctypes.CFUNCTYPE(None, ctypes.POINTER(access_rights_handler_args))(access_rights_handler)

and the python function `access_righte_handler` would be able to unpack and use this structure.  This worked for Python 2.7, 3.3 - 3.7.5 on 64-bit Linux, Windows, and MacOS.  This code was well-tested and was used in production code on very many systems. It did not cause segfaults.

With Python 3.7.6 this raises an exception at the ctypes.CFUNCTYPE() call with


...../lib/python3.7/ctypes/__init__.py", line 99, in CFUNCTYPE
    class CFunctionType(_CFuncPtr):
TypeError: item 1 in _argtypes_ passes a struct/union with a bitfield by value, which is unsupported.


We were able to find a quick work-around this by changing the structure definition to be

class access_rights_handler_args(ctypes.Structure):
    "access rights arguments"
    _fields_ = [('chid', ctypes.c_long),
                ('access', ctypes.c_ubyte)]

and then explicitly extract the 2 desired bits from the byte. Of course, that byte is more data than is being sent in the structure, so there is trailing garbage.

This change seems to have been related to https://bugs.python.org/issue16576.

Is there any way to restore the no-really-I'm-not-making-it-up-it-was-most-definitely-working-for-us behavior of Python 3.7.5 and earlier?  

If this is not possible, what would be the right way to wrap this sort of structure? Thanks
msg359852 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2020-01-12 14:35
Adding @vinay.sajip to the nosy list as he worked on issue16576.
msg359877 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2020-01-12 21:08
The change which caused this breakage has been reverted, at least for now. bpo-16576 is related to bpo-16575 (which is about unions rather than bitfields, but the cases have things in common).

Refer to bpo-16575 for PRs relating to the reversion. This includes 3.7. You are welcome to try the latest version (will need building from source, or waiting for the next release).

There is still an underlying ctypes/libffi issue, and the check was only supposed to catch instances being passed by value, rather than pointers to such structures/unions being passed. It may be that the check is faulty - your example will be used to investigate.
msg359884 - (view) Author: Matthew Newville (Matthew Newville) Date: 2020-01-13 02:47
Thanks for the reply and the fix -- I have not tried the master branch, but will try to do that soon. If I understand correctly, we will have to stick with our kludgy "workaround" version in order to work with Python 3.7.6 and 3.8.1.  Or is there a better approach than our workaround of using

class access_rights_handler_args(ctypes.Structure):
    "access rights arguments"
    _fields_ = [('chid', ctypes.c_long),
                ('access', ctypes.c_ubyte)]

?   

As a long-time (20 years) Python user and first-time reporter of a bug to main Python, I'm both very appreciative of the effort and slightly alarmed by reading the messages related to #16575.  From far outside the Python dev world, it appears that an old, poorly verified bug report inspired a change that was actually not well tested and so incorrectly broke valid code without deprecation. Trying to be as polite as possible, this appears to indicate a poor testing process, if not a poor understanding of the actual code in question. 

Trust is an important aspect of open source software, and much easier to lose than gain.  I strongly encourage you and other Python devs to carefully assess what went wrong here and to work out (and write down) what will be done going forward to avoid such problems. Simply rolling this change back and saying "sorry, but we're overworked volunteers and stuff happens" is not going to regain lost trust. In fact, it's pretty close to a promise that this sort of issue will happen again. I think that you may want to make sure that it is not the take-away message here.
Sorry if that sounds in any way unappreciative.  Thanks.
msg359889 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2020-01-13 05:53
> it appears that an old, poorly verified bug report 

Why do you say the bug report is poorly verified? The libffi maintainers accept it is an issue.

https://github.com/libffi/libffi/issues/33

> inspired a change that was actually not well tested and so incorrectly broke valid code without deprecation. Trying to be as polite as possible, this appears to indicate a poor testing process, if not a poor understanding of the actual code in question.

Well, the original developer of ctypes is no longer involved in maintaining it. Those of us who try to address ctypes issues are perhaps not as well-versed in the code as the original developer and maintainer, but we do our best. This of course applies to other areas in CPython, and many other projects besides.

The change was accompanied by tests, which have been no doubt found wanting, but do *you* write software which guarantees no bugs ever in released versions? Using your example above, I will look into what was missed and try to improve the checking. The underlying libffi issue is a real one. The change wasn't introduced in a cavalier manner, as you seem to be suggesting.

> I strongly encourage you and other Python devs to carefully assess what went wrong here and to work out (and write down) what will be done going forward to avoid such problems. Simply rolling this change back and saying "sorry, but we're overworked volunteers and stuff happens" is not going to regain lost trust. In fact, it's pretty close to a promise that this sort of issue will happen again. I think that you may want to make sure that it is not the take-away message here.
Sorry if that sounds in any way unappreciative.  Thanks.

Well, "stuff happens" is true, and I don't mean to trivialise anything. But almost all released non-trivial software has bugs. If we didn't have a good process, then we could perhaps be held to task, but that's not the case. That things sometimes slip through the cracks is pretty much a given in software development - not confined to CPython.

My backporting the changes to 3.8 and 3.7 were premature, and I have learned that lesson. It's often a matter of judgement, and sometimes that can go wrong.

Do you regularly test your code with Python alpha and beta versions? I ask because I may reinstate the check for Python 3.9 after seeing what false-positive cases were missed here. Python 3.9 is at alpha 2 level right now. Continued feedback could help to minimise the chances of future surprises.
msg359947 - (view) Author: Matthew Newville (Matthew Newville) Date: 2020-01-14 05:54
So, again, I'm trying to understand what the best workaround for this change is.  I asked "can this workaround be improved" twice and got no reply, while getting plenty of responses to questions about the development process.  I take this to mean that the workaround we have is the best available. That's unfortunate.

>> it appears that an old, poorly verified bug report
>
> Why do you say the bug report is poorly verified? The libffi maintainers
> accept it is an issue.
>
> https://github.com/libffi/libffi/issues/33

Well, it is more than six years old. It did not appear that lots of people were saying "yeah me too, please fix!".  Maybe I missed those calls of urgency? 


>> inspired a change that was actually not well tested and so incorrectly 
>> broke valid code without deprecation. Trying to be as polite as possible, >> this appears to indicate a poor testing process, if not a poor 
>> understanding of the actual code in question.

> Well, the original developer of ctypes is no longer involved in maintaining > it. Those of us who try to address ctypes issues are perhaps not as well-
> versed in the code as the original developer and maintainer, but we do our
> best. This of course applies to other areas in CPython, and many other
> projects besides.

I think you are agreeing with me ;).  That worries me.

> The change was accompanied by tests, which have been no doubt found 
> wanting, 

It appears that the change was *intended* to fix a problem with Unions, but had the unintended consequence of not allowing any structures with bitfields. That suggests that the ctypes tests don't include structures with bitfields, These seem sort of common to me, so it appears that testing of ctypes is far less comprehensive than I had imagined.

> but do *you* write software which guarantees no bugs ever in released 
> versions? 

Of course not, and that is not the expectation.  It's a bit alarming to hear Python devs be so defensive and using such straw man arguments.  What is expected is that working code does not break without warning or deprecation and that testing is sort of complete. It is expected that when changes unintentionally break working code that the devs take a step back and say "wait, how did that happen? what are we not testing that our users are expecting to work?". It is also expected that problems are acknowledged and fixed in a timely manner.  

And yes, to the extent possible, we try to do those things. With Py 3.7.6 available and installed with `conda update`, this break was a very urgent problem (library failed to import!) for us and our downstream users. Within 36 hours of the first report of the problem, we had a workaround verified and pushed to PyPI. The response of Python team to the original problem and to the unintended breakage were much slower.  
 

> Using your example above, I will look into what was missed 
> and try to improve the checking. The underlying libffi issue is a real
> one. The change wasn't introduced in a cavalier manner, as you seem to
> be suggesting.

Well, the change did wait six years from the first report and yet did not include a deprecation cycle. If that TypeError had been a Warning for a few releases, you would have no doubt heard questions like "why is this working code going to be deprecated" instead of "why did Python break by library".

> Do you regularly test your code with Python alpha and beta versions? 
> I ask because I may reinstate the check for Python 3.9 after seeing what
> false-positive cases were missed here. Python 3.9 is at alpha 2 level 
> right now. Continued feedback could help to minimise the chances of 
> future surprises.

I have never tested an `alpha` versions, and not used many `beta` version either (code development is not actually my full-time job) -- certainly not in the Python 3 series. I have trusted the Python devels.  This has worked well for many years and Python versions. But that trust, especially concerning ctypes, is diminished (a structure with bitfields was unexpected usage??) and we probably should be testing beta versions regularly.
msg359956 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2020-01-14 08:51
The change has now been reverted, including on 3.8 and 3.7, so I think that this issue can be closed. Any naysayers?
msg359998 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-01-14 21:00
> With Python 3.7.6 this raises an exception at the ctypes.CFUNCTYPE() 
> call with
> ...
> TypeError: item 1 in _argtypes_ passes a struct/union with a bitfield
> by value, which is unsupported.

I cannot reproduce the problem as stated in 3.7.6 in Windows. The TypeError only occurs if I use the struct type directly in argtypes, not a pointer to the struct type. For example:

    >>> class A(ctypes.Structure):
    ...     _fields_ = (('x', ctypes.c_uint, 1),)

    >>> class P_A(ctypes._Pointer):
    ...     _type_ = A
    ...

allowed:

    >>> class FP_P_A(ctypes._CFuncPtr):
    ...     _flags_ = ctypes._FUNCFLAG_CDECL
    ...     _argtypes_ = (P_A,)

disallowed:

    >>> class FP_A(ctypes._CFuncPtr):
    ...     _flags_ = ctypes._FUNCFLAG_CDECL
    ...     _argtypes_ = (A,)
    ...
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: item 1 in _argtypes_ passes a struct/union with a bitfield by value, which is unsupported.

This seems rights to me. There is no problem passing a pointer as a function parameter.
msg360013 - (view) Author: Matthew Newville (Matthew Newville) Date: 2020-01-14 23:40
@eryksun Sorry for the imprecision -- I was mixing what we do on Linux and Windows. A minimum verifiable example for Linux/MacOS would be

    import ctypes
    class bitstruct(ctypes.Structure):
        _fields_ = [('addr', ctypes.c_long),
                    ('rbit', ctypes.c_uint, 1),
                    ('wbit', ctypes.c_uint, 1)]

    def handler(args):
        print("handler: ", args.addr, args.rbit, args.wbit)

    callback = ctypes.CFUNCTYPE(None, bitstruct)(handler)

This works with 3.7.5 but raises a TypeError with 3.7.6.

For Windows (or, well, 64-bit Windows, the only kind we bother to support), we find that we have to wrap the function and use a POINTER to the struct, so what we really use is more like

    import os, functools
    def make_callback(args, func):
        """ make callback function"""
        @functools.wraps(func)
        def wrapped(arg, **kwargs):
            if hasattr(arg, 'contents'):
                return func(arg.contents, **kwargs)
            return func(arg, **kwargs)
        if os.name =='nt': # also check for 64-bit
            cb = ctypes.CFUNCTYPE(None, ctypes.POINTER(args))(wrapped)
        else:
            cb = ctypes.CFUNCTYPE(None, bitstruct)(handler)
        return cb

   callback = make_callback(bitstruct, handler)


> ...
> This seems rights to me. There is no problem passing a pointer 
> as a function parameter.

The problem here is that code that worked with 3.7.5 raises a TypeError with 3.7.6.

I don't know that the solution we came up with is actually the best approach.  I've asked for such guidance a few times now.  I don't know why using a pointer would be required for a structure containing a "u_int, 1", but not for other structures, but any guidance would be much appreciated.
History
Date User Action Args
2020-01-23 05:47:34vinay.sajipsetstatus: open -> closed
resolution: fixed
stage: resolved
2020-01-14 23:40:54Matthew Newvillesetmessages: + msg360013
2020-01-14 21:00:11eryksunsetnosy: + eryksun
messages: + msg359998
2020-01-14 08:51:04vinay.sajipsetmessages: + msg359956
2020-01-14 06:01:52ned.deilysetkeywords: + 3.7regression, 3.8regression
versions: + Python 3.8
2020-01-14 05:54:28Matthew Newvillesetmessages: + msg359947
2020-01-13 05:53:55vinay.sajipsetmessages: + msg359889
2020-01-13 02:47:44Matthew Newvillesetmessages: + msg359884
2020-01-12 21:08:39vinay.sajipsetmessages: + msg359877
2020-01-12 14:35:46cheryl.sabellasetnosy: + vinay.sajip, cheryl.sabella
messages: + msg359852
2020-01-10 22:16:07Matthew Newvillecreate