classification
Title: ctypes: clearly document how structure bit fields are allocated
Type: enhancement Stage: needs patch
Components: ctypes, Documentation Versions: Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, meador.inge, terry.reedy, vladris
Priority: normal Keywords: patch

Created on 2011-09-02 02:34 by meador.inge, last changed 2011-10-04 23:51 by terry.reedy.

Files
File name Uploaded Description Edit
bitfield_doc.diff vladris, 2011-09-30 17:19 review
Messages (10)
msg143369 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2011-09-02 02:34
As issues like issue6069 and issue11920 allude to, figuring out how 'ctypes' allocates bit-fields is not very clear.  The documentation should be enhanced to flesh this out in more detail.  As an example, Microsoft documents the VC++ implementation in a reasonably clear manner ( http://msdn.microsoft.com/en-us/library/ewwyfdbe(v=vs.71).aspx ).
msg143846 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2011-09-11 01:39
Another example of desperately needed documentation: issue12945.
msg144696 - (view) Author: Vlad Riscutia (vladris) Date: 2011-09-30 17:19
Attached doc update against tip (though I still hope my patch for configurable allocation strategies will make it in). 

This is my first doc patch so let me know if I missed something. I am basically explaining that bit field allocation is compiler-specific and no assumptions should be made of how a bitfield is allocated. 

I believe this is the better thing to do rather than detailing how GCC and MSVC allocated their bitfields because that would just encourage people to use this feature incorrectly. Most bugs opened on bit fields are because people are toying with the underlying buffer and get other results than what they expect. IMO, when using bitfields one should only access the structure members at a high level and not go read/write the raw memory underneath.
msg144854 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2011-10-04 04:21
On Fri, Sep 30, 2011 at 12:19 PM, Vlad Riscutia <report@bugs.python.org> wrote:

> I believe this is the better thing to do rather than detailing how GCC and MSVC allocated their bitfields because that would just
> encourage people to use this feature incorrectly.

So clearly documenting how a feature works will cause people to use
the feature incorrectly?  I think not.  In any case, I agree that
documenting the low-level specifics of each compiler's algorithm is too much.

> Most bugs opened on bit fields are because people are toying with the underlying buffer and get other results than what they expect.

The issues that I have looked at (issue6069, issue11920, and
issue11920) all involve fundamental misunderstandings of *how* the
structure layout is determined.  I don't know if I would generalize
these misunderstanding as "toying with the underlying buffer".  Some
times people need to know the exact layout for proper C interop.  In
some of the bugs reported folks are casting buffers in an attempt
to discover the structure layout since it is not clearly documented.

The general content of your patch seems reasonable.  I will provide
more specific comments shortly.
msg144855 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2011-10-04 04:49
Added some comments in rietveld.  P.S. watch out for trailing whitespace
when writing patches.  Use 'make patchcheck' to help find bad whitespace
formatting.
msg144894 - (view) Author: Vlad Riscutia (vladris) Date: 2011-10-04 15:21
Thanks for the "make patchcheck" tip, I didn't know about that. I will update the patch soon.

In the mean time, I want to point out a couple of things:
First, I'm saying "toying with the underlying buffer" because none of the bugs are actual issues of the form "I created this bitfield structure with Python, passed it to C function but C structure was different". That would be a bitfield bug. All of these bugs are people setting raw memory to some bytes, then looking at bitfield members and not seeing what they expect. 

Since this is platform dependent, they shouldn't worry about the raw memory as long as C interop works fine. Bitfield layout is complex as it involves both allocation algorithm and structure packing and same Python code will work differently on Windows and Unix.

My point is that documenting all this low-level stuff will encourage people to work with the raw memory which will open the door for other issues. I believe it would be better to encourage users to stick to declaring members and accessing them by name as raw memory WILL be different for the same code on different OSes.

Second, one of your review comments is: "GCC is used for most Unix systems and Microsoft VC++ is used on Windows.". This is not how ctypes works. Ctypes implements the bitfield allocation algorithm itself, it doesn't use the compiler with which it is built. Basically it says #ifdef WIN32 - allocate like VC++ - #else - allocate like GCC. So it doesn't really matter with which compiler you are building Python. It will still do GCC style allocation on Solaris.
msg144911 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2011-10-04 17:44
On Tue, Oct 4, 2011 at 10:21 AM, Vlad Riscutia <report@bugs.python.org> wrote:

> First, I'm saying "toying with the underlying buffer" because none of the bugs are actual issues of the form "I created this bitfield
> structure with Python, passed it to C function but C structure was different". That would be a bitfield bug. All of these bugs are people
> setting raw memory to some bytes, then looking at bitfield members and not seeing what they expect.

Please qualify "all" instead of generalizing.  I can point to two
issues (issue11990 "I'm generating python code from real c code.",
issue12945 "We have raw data packages from some tools. These packages
contains bitfields, arrays, simple data and so on.") where C
code or raw data was, in fact, involved and the reporters just don't
understand what layout algorithm is being used.  They may not need
to know the specifics of the algorithm, but they *do* need to know if
it matches the compiler they are using to do interop or the one
that generated the raw data.

The reason that we are seeing folks cast raw memory into a cyptes
bitfield structure is because they do not understand how the structure
layout algorithm works and are trying to figure it out via these
examples.

> Second, one of your review comments is: "GCC is used for most Unix systems and Microsoft VC++ is used on Windows.". This is not
> how ctypes works. Ctypes implements the bitfield allocation algorithm itself, it doesn't use the compiler with which it is built. Basically
> it says #ifdef WIN32 - allocate like VC++ - #else - allocate like GCC. So it doesn't really matter with which compiler you are building
> Python. It will still do GCC style allocation on Solaris.

I understand how it works.  This quote is taken somewhat out of
context as the preceding sentence is important.  Perhaps saying GCC-
style and VC++-style would have been more clear.  The reason that I
mentioned the compiler used to build Python is that it is an easy
reference point and more times than not the bitfield allocation and
layout *do* match that of the compiler used to build the interpreter.
Anyway, I am fine with dropping the "used to build the Python
interpreter" and going with something similar to what you originally
had.

Also, in general, the compiler used to build the ctypes extension
*does* matter.  Look in 'cfield.c' where all of the native alignments
are
computed at compile time.  These alignments affect the structure
layout and are defined by the compiler building the ctypes extension.
msg144913 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2011-10-04 17:46
> Look in 'cfield.c' where all of the native alignments

Well, not *all* the native alignments, but many of them.
msg144919 - (view) Author: Vlad Riscutia (vladris) Date: 2011-10-04 18:39
I agree compiler matters for alignment but if you look at PyCField_FromDesc, you will see the layout is pretty much #ifdef MS_WIN32 - #else.

Sorry for generalizing, "all" indeed is not the right word. My point is that we should set expectation correctly - VC++-style on Windows, GCC-style everywhere else and encourage users to access structure members by name, not raw memory. Issues opened for bitfields *usually* are of the form I mentioned - setting raw memory to some bytes then seeing members are not what user expected, even if ctypes algorithm works correctly.

As I said, I will revise the patch and maybe make it more clear that users should look up how bitfield allocation works for their compiler instead of trying to understand this via structure raw memory.
msg144932 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-10-04 23:51
If I understand correctly, this doc patch would apply to 2.7 and 3.2 also. I have two style comments. I believe

"It is important to note that bit field allocation and layout in memory is not defined as a standard, rather its implementation is compiler-specific." 

could be shortened to

"Bit field allocation and memory layout is compiler-specific."

To me, this leads nicely into the proposed sentence that follows.

"it is recommended that no assumptions are made about the structure size and layout."

I do not like 'it is recommended'. Let us state the fact.

"any assumptions about the structure size and layout may be wrong."
History
Date User Action Args
2011-10-04 23:51:15terry.reedysetmessages: + msg144932
versions: - Python 3.4
2011-10-04 18:39:26vladrissetmessages: + msg144919
2011-10-04 17:46:35meador.ingesetmessages: + msg144913
2011-10-04 17:44:29meador.ingesetmessages: + msg144911
2011-10-04 15:21:45vladrissetmessages: + msg144894
2011-10-04 04:49:47meador.ingesetmessages: + msg144855
2011-10-04 04:21:33meador.ingesetmessages: + msg144854
2011-09-30 17:19:10vladrissetfiles: + bitfield_doc.diff
keywords: + patch
messages: + msg144696
2011-09-11 01:39:49meador.ingesetmessages: + msg143846
components: + ctypes
2011-09-02 02:34:11meador.ingecreate