classification
Title: Correct __sizeof__ support for struct
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: meador.inge Nosy List: asvetlov, gregory.p.smith, jcea, loewis, mark.dickinson, meador.inge, python-dev, serhiy.storchaka, skrah
Priority: normal Keywords: patch

Created on 2012-07-20 06:49 by serhiy.storchaka, last changed 2012-07-29 03:36 by meador.inge. This issue is now closed.

Files
File name Uploaded Description Edit
struct_sizeof-2.patch serhiy.storchaka, 2012-07-20 06:49 review
issue15402-redux-v1.patch meador.inge, 2012-07-26 03:59 review
Messages (40)
msg165902 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-07-20 06:49
Here is a patch that implements __sizeof__ for struct.Struct.

See also issue14596.
msg165936 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2012-07-20 15:13
Module a few cosmetic changes (variable names and doc string tweaks), the patch looks good.  Having a correct answer for `sys.getsizeof(struct.Struct('100B'))` is definitely better.  Per the documentation for 'sys.getsizeof' [1]:

"""
All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.
"""

If we consider extension modules as built-in objects, then this is a bug and should be fixed in 2.7, 3.2, and 3.3.  If we don't, then the documentation for 'sys.getsizeof' should be adjusted and this patch should be applied after the default branch is unfrozen for enhancements.

[1] http://docs.python.org/library/sys.html
msg165945 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-07-20 16:38
The patch is easy. I want this in 3.3.

Reluctant to touch 2.7 and 3.2, thought. I would be +0 to it.
msg165946 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-07-20 16:56
makes sense for 3.3 as i would consider it a bug.

i think it is reasonable for 2.7 and 3.2 as well, it is an actual bug that the value reported to getsizeof on struct.Struct is meaningless.
msg166125 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-07-22 11:29
+1 for patch applying in 3.3 at least
msg166226 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2012-07-23 15:20
OK, I will commit this sometime today.
msg166227 - (view) Author: Roundup Robot (python-dev) Date: 2012-07-23 15:25
New changeset dbe7f39ff341 by Meador Inge in branch '2.7':
Issue #15402: Add a __sizeof__ method to struct.Struct.
http://hg.python.org/cpython/rev/dbe7f39ff341

New changeset 3e7b517e1b68 by Meador Inge in branch '3.2':
Issue #15402: Add a __sizeof__ method to struct.Struct.
http://hg.python.org/cpython/rev/3e7b517e1b68

New changeset 03063e718f5f by Meador Inge in branch 'default':
Issue #15402: Add a __sizeof__ method to struct.Struct.
http://hg.python.org/cpython/rev/03063e718f5f
msg166228 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2012-07-23 15:26
Thanks for the patch Serhiy!
msg166231 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-07-23 16:16
We collide mid-air, Meador. I was just checking-in this :-). I have changed the tests to actually verify the changes :-). Also added to "Doc/ACKS.txt".

Could I suggest you to take care of issue #15424 too?.
msg166232 - (view) Author: Roundup Robot (python-dev) Date: 2012-07-23 16:17
New changeset b1d85a44f149 by Jesus Cea in branch '2.7':
Better test for Issue #15402: Add a __sizeof__ method to struct.Struct
http://hg.python.org/cpython/rev/b1d85a44f149

New changeset 1911e192af0d by Jesus Cea in branch '3.2':
Better test for Issue #15402: Add a __sizeof__ method to struct.Struct
http://hg.python.org/cpython/rev/1911e192af0d

New changeset b9a3ed1b14b9 by Jesus Cea in branch 'default':
MERGE: Better test for Issue #15402: Add a __sizeof__ method to struct.Struct
http://hg.python.org/cpython/rev/b9a3ed1b14b9
msg166235 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2012-07-23 16:29
Hi Jesús,

I replied to python-dev, but the Doc/ACKS.txt changes aren't necessary and I was OK with the way Serhiy submitted the tests.
msg166250 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-23 22:04
The patch that Meador committed is incorrect: METH_NOARGS functions still take a PyObject* args argument, which will be NULL. I'm puzzled, as Serhiy's original patch was correct.

As for the tests, I really wish there were tests that tested for *actual* values.
msg166252 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-23 22:29
Also, I wonder why this loops over s_codes, instead of just looking at s_len+1.
msg166257 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2012-07-24 01:05
On Mon, Jul 23, 2012 at 5:04 PM, Martin v. Löwis <report@bugs.python.org> wrote:

> The patch that Meador committed is incorrect: METH_NOARGS functions
> still take a PyObject* args argument, which will be NULL. I'm puzzled, as
> Serhiy's original patch was correct.

I almost committed the two-argument version with the 'unused' parameter, but
then I had the bright idea to look at how '__sizeof__' is implemented elsewhere
in the interpreter:

  static PyObject *
  list_sizeof(PyListObject *self);
  static PyObject *
  dict_sizeof(PyDictObject *mp);
  static PyObject *
  set_sizeof(PySetObject *so);

  etc ...

So I dropped the 'unused' parmeter for the 'struct' implementation:

   static PyObject *
   s_sizeof(PyStructObject *self);

I will happily fix it, but if it is wrong one place, then it is wrong
everywhere.
msg166354 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2012-07-25 04:52
I agree with Martin on having tests for the exact values and the s_len + 1 point.  I will fix both issues.
msg166363 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-07-25 08:11
[Martin]
> The patch that Meador committed is incorrect: METH_NOARGS functions
> still take a PyObject* args argument, which will be NULL.

Hmm.  I see this usage in a lot of places---e.g. see unicode_capitalize, unicode_casefold, unicode_title etc. in Objects/unicodeobject.c.  So it looks like we're relying on the (PyCFunction) cast to convert from a one-argument function pointer to a two-argument function pointer, which sounds a bit worrisome---I guess it just happens to work with common ABI calling conventions. I'm a bit surprised that we're not seeing compiler warnings about this sort of thing.

[Meador]
> I will happily fix it, but if it is wrong one place, then it is wrong
> everywhere.

It sounds like 'wrong everywhere' is accurate, unfortunately.
msg166371 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-25 09:01
> I will happily fix it, but if it is wrong one place, then it is wrong
> everywhere.

Yes, it is wrong everywhere. METH_NOARGS functions do take an
args argument, see ceval.c:call_function.
msg166372 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-25 09:06
> Hmm.  I see this usage in a lot of places---e.g. see
> unicode_capitalize, unicode_casefold, unicode_title etc. in
> Objects/unicodeobject.c.  So it looks like we're relying on the
> (PyCFunction) cast to convert from a one-argument function pointer to
> a two-argument function pointer, which sounds a bit worrisome---I
> guess it just happens to work with common ABI calling conventions.
> I'm a bit surprised that we're not seeing compiler warnings about
> this sort of thing.

The compiler has no chance to find out. You cast the pointer to
PyCFunction, telling the compiler that it really is a PyCFunction.

I really wish we could put a ban on function pointer casts, and try
to make this all statically type-correct. This, of course, would
require the sizeof function to take PyObject*, and cast it to
PyStructObject * locally. My idiom for that is

static PyObject *
s_sizeof(PyStructObject *_self, PyObject *unused)
{
  PyStructObject *self = (PyStructObject *)_self;

> It sounds like 'wrong everywhere' is accurate, unfortunately.

"Everywhere" is nowhere close to the truth. There are tons of
NOARGS functions which have the correct signature.
msg166380 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-07-25 10:29
> The compiler has no chance to find out. You cast the pointer to
> PyCFunction, telling the compiler that it really is a PyCFunction.

True; I was thinking that the compiler should have the necessary information to warn about the suspicious (PyCFunction) cast.  But then again the function pointer cast is perfectly legal---it's the subsequent call that invokes undefined behaviour, and that's in a different file, so the compiler can't help.

> "Everywhere" is nowhere close to the truth.

Yep, sorry;  bad wording on my part.  I didn't intend to imply that all uses of METH_NOARGS had this problem.  'Everywhere' for very small values of 'everywhere'. :-)
msg166384 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-07-25 11:36
""Everywhere" is nowhere close to the truth. There are tons of
NOARGS functions which have the correct signature."

When I started writing C-extensions, I used the PyObject *unused
idiom. Then I saw Meador's version in so many places in the source
tree that I assumed it's correct.

I think raising this issue on python-dev would be beneficial.
msg166389 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-07-25 12:02
I think that C standard actually documents the parameter order placement, so you can left out the last ones in the actual function definition.

So, that this is working is not an accident, it is a C standard requirement.

I think...
msg166391 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-25 13:11
Jesús: this is a (common) mistake. Standard C makes it undefined behavior to call a function with an incorrect number of arguments, see 6.5.2.2p9

# If the function is defined with a type that is not compatible with the 
# type (of the expression) pointed to by the expression that denotes the 
# called function, the behavior is undefined.

Two function pointers are compatible if the function types are compatible (6.7.5.1p2), which in turn is defined in 6.7.5.3p15,
which is too long to quote here.

Your understanding of the parameter passing convention is not part of the language definition. The only way to have more parameters in the call than are declared is by means of an ellipsis. 

There is an exception for "old style" (K&R) functions and declarations, but it doesn't apply here.
msg166393 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-25 13:22
Stefan: not sure whether raising this to python-dev really helps; see also  msg42177, msg58196, issue1648268, and msg75239. Feel free to raise it, though.
msg166405 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2012-07-25 15:11
While looking this up in the C Standard (ISO/IEC 9899:TC3) I also noticed 6.3.2.3p8:

"""
A pointer to a function of one type may be converted to a pointer to a function of another
type and back again; the result shall compare equal to the original pointer. If a converted
pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined.
"""

I will add the 'unused' parameter back with my refresh patch.
msg166433 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-07-25 22:38
So the good old "int main(void)" behaviour is "undefined" :-)
msg166434 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-07-25 22:41
Talking about this, anybody has an electronic version of C Standard to share with me, or should I hunt around the dark corners of Internet? :-)
msg166435 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2012-07-25 22:49
On Wed, Jul 25, 2012 at 5:38 PM, Jesús Cea Avión <report@bugs.python.org> wrote:

> So the good old "int main(void)" behaviour is "undefined" :-)

Actually the C Standard says that is OK.  See 5.1.2.2.1 of ISO/IEC 9899:TC3.
msg166454 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2012-07-26 03:59
The attached patch fixes: (1) the test cases to make them exact (inspired by 'test_sys.py'), (2) the definition of 's_sizeof' to add the dummy parameter, and (3) simplifies the sizeof calculation using 's_len'.
msg166456 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-07-26 04:10
> Talking about this, anybody has an electronic version of C Standard to
> share with me, or should I hunt around the dark corners of Internet? :-)

http://www.open-std.org/jtc1/sc22/wg14/www/standards (see n1256.pdf).
msg166457 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-26 04:36
> So the good old "int main(void)" behaviour is "undefined" :-)

No, it's not; see 5.1.2.2.1p1.
msg166458 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-26 04:37
issue15402-redux-v1.patch LGTM.
msg166460 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-07-26 05:55
> +        # The size of 'PyStructObject'
> +        totalsize = size(self.header + '5P')

What if use totalsize = object.__sizeof__(struct_obj) ?
msg166485 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-26 14:41
> What if use totalsize = object.__sizeof__(struct_obj) ?

That would defeat the purpose of the test. We want to test whether __sizeof__ is correct, so we shouldn't use __sizeof__ in the test to compute the expected result. I understand that object.__sizeof__ is actually a different implementation, but still: there might be errors e.g. in the type definition that cancel out errors in the sizeof implementation. The more "directly" the expected result is computed, the better.

I also realize that such tests will be fragile if the the structures change. This is a good thing, IMO: anybody changing the layout of some object should *have* to verify that the size computation is still correct, so it's good that the test breaks if the structures change.
msg166508 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-07-26 19:46
> That would defeat the purpose of the test. We want to test whether
> __sizeof__ is correct, so we shouldn't use __sizeof__ in the test to
> compute the expected result. I understand that object.__sizeof__ is
> actually a different implementation, but still: there might be errors e.g.
> in the type definition that cancel out errors in the sizeof
> implementation. The more "directly" the expected result is computed, the
> better.

I do not think that the purpose of testing is a testing of object.__sizeof__. 
Memory consumption consists of two parts -- memory for C structure (and the 
base object implementation works for this) and extra memory, for which we 
write a specialized __sizeof__ method. If we doubt object.__sizeof__, then we 
are obligated to implement and test __sizeof__ methods for all C-implemented 
classes, not using the base object implementation.

> I also realize that such tests will be fragile if the the structures
> change. This is a good thing, IMO: anybody changing the layout of some
> object should *have* to verify that the size computation is still correct,
> so it's good that the test breaks if the structures change.

Such tests is too fragile. They force the programmer to write unnecessary code 
in cases when it can be done automatically. We write in C code 
sizeof(SomeStruct), and not the sum of sizes (+paddings) of the structure 
fields.

Let's focus on the differences, on the extra memory usage that not allows us to 
simply use inherited base object implementation.
msg166511 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-26 20:03
> I do not think that the purpose of testing is a testing of object.__sizeof__. 
> Memory consumption consists of two parts -- memory for C structure (and the 
> base object implementation works for this)

Note that object.__sizeof__ does something slightly different, though:
it uses basicsize (which may or may not contain the sizeof() invocation
of the correct C structure), and it considers tp_itemsize (which may or
may not have a correct value).

> 
>> I also realize that such tests will be fragile if the the structures
>> change. This is a good thing, IMO: anybody changing the layout of some
>> object should *have* to verify that the size computation is still correct,
>> so it's good that the test breaks if the structures change.
> 
> Such tests is too fragile. They force the programmer to write unnecessary code 
> in cases when it can be done automatically.

That's not the definition of "fragile", though. What you describe is
that writing the test this way is "tedious" (утомительный); it isn't
(necessarily) "fragile" (хрупкий). I (clearly) disagree that this
approach is "too tedious".
msg166517 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-07-26 20:29
> Note that object.__sizeof__ does something slightly different, though:
> it uses basicsize (which may or may not contain the sizeof() invocation
> of the correct C structure), and it considers tp_itemsize (which may or
> may not have a correct value).

All such cases are bugs (memory manager works with tp_basicsize and 
tp_itemsize, not with __sizeof__ result) and tests do not test it. In 
paranoidal mode we should tests both __sizeof__ and object.__sizeof__. For all 
classes, even for those that do not use the extra memory. I think it is really 
tedious.
msg166518 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-26 20:46
> All such cases are bugs (memory manager works with tp_basicsize and 
> tp_itemsize, not with __sizeof__ result) and tests do not test it. In 
> paranoidal mode we should tests both __sizeof__ and object.__sizeof__. For all 
> classes, even for those that do not use the extra memory. I think it is really 
> tedious.

It's clearly a tradeoff. The question is whether a more paranoid
formulation of the test might detect any real bugs. issue15456
efficiently demonstrates that the current style can detect bugs
which testing with object.__sizeof__ can't. This is not theoretical:
it's an *actual* bug that did get detected with the current style
of testing, but would not have been detected with object.__sizeof__.
This, IMO, makes the more tedious formulation worthwhile.

Of course, developers need to be educated how to deal with any
breakage of these tests: it may be that they really just added
or removed fields to the structure, in which case they just need
to update the struct specs. In many cases, I claim, addition of
new fields (in particular of struct type "P") corresponds to the
allocation of additional memory blocks.
msg166636 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-07-28 08:27
> issue15456
> efficiently demonstrates that the current style can detect bugs
> which testing with object.__sizeof__ can't.

Hmm, I see this as a counterexample. The bug *was not detected* with the 
current style of testing.

But if you insist, I began to translation tests to more tedious style. I hope 
that issue 15467 patch will be accepted, it is slightly reduce tediousity. 
However, not all of the tests can be written in this style.
msg166664 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2012-07-28 16:35
On Sat, Jul 28, 2012 at 3:27 AM, Serhiy Storchaka
<report@bugs.python.org> wrote:

> Serhiy Storchaka <storchaka@gmail.com> added the comment:
>
>> issue15456
>> efficiently demonstrates that the current style can detect bugs
>> which testing with object.__sizeof__ can't.
>
> Hmm, I see this as a counterexample. The bug *was not detected* with the
> current style of testing.

I disagree.  It wasn't *directly* detected -- the test broke because of the new
structure field that was added and not because of the new dynamic memory
allocation that was added.  Even so, the fact that the test broke *at all*
should have thrown a warning flag up in the developer's mind to reevaluate
how the size is calculated.  If the test were written using your
object.__sizeof__
method, then the test would not have broken *at all* and therefore it might not
have even crossed the developers mind to verify whether the sizeof calculation
is correct.
msg166709 - (view) Author: Roundup Robot (python-dev) Date: 2012-07-29 03:36
New changeset 37554bda2014 by Meador Inge in branch '2.7':
Issue #15402: Simplify Struct.__sizeof__ and make tests more precise.
http://hg.python.org/cpython/rev/37554bda2014

New changeset 97dd2090a36c by Meador Inge in branch '3.2':
Issue #15402: Simplify Struct.__sizeof__ and make tests more precise.
http://hg.python.org/cpython/rev/97dd2090a36c

New changeset 0eedf56f9a38 by Meador Inge in branch 'default':
Issue #15402: Simplify Struct.__sizeof__ and make tests more precise.
http://hg.python.org/cpython/rev/0eedf56f9a38
History
Date User Action Args
2012-07-29 03:36:54meador.ingesetstatus: open -> closed
resolution: fixed
stage: needs patch -> resolved
2012-07-29 03:36:15python-devsetnosy: + python-dev
messages: + msg166709
2012-07-28 16:35:18meador.ingesetmessages: + msg166664
2012-07-28 08:27:01serhiy.storchakasetmessages: + msg166636
2012-07-26 20:46:17loewissetmessages: + msg166518
2012-07-26 20:29:21serhiy.storchakasetmessages: + msg166517
2012-07-26 20:03:48loewissetmessages: + msg166511
2012-07-26 19:46:54serhiy.storchakasetmessages: + msg166508
2012-07-26 14:41:11loewissetmessages: + msg166485
2012-07-26 05:55:40serhiy.storchakasetmessages: + msg166460
2012-07-26 04:37:00loewissetmessages: + msg166458
2012-07-26 04:36:30loewissetmessages: + msg166457
2012-07-26 04:10:54serhiy.storchakasetmessages: + msg166456
2012-07-26 03:59:16meador.ingesetfiles: + issue15402-redux-v1.patch

messages: + msg166454
2012-07-25 22:49:04meador.ingesetmessages: + msg166435
2012-07-25 22:41:26jceasetmessages: + msg166434
2012-07-25 22:38:50jceasetmessages: + msg166433
2012-07-25 15:11:36meador.ingesetmessages: + msg166405
2012-07-25 13:22:27loewissetmessages: + msg166393
2012-07-25 13:11:19loewissetmessages: + msg166391
2012-07-25 12:02:09jceasetmessages: + msg166389
2012-07-25 11:36:27skrahsetnosy: + skrah
messages: + msg166384
2012-07-25 10:29:11mark.dickinsonsetmessages: + msg166380
2012-07-25 09:06:26loewissetmessages: + msg166372
2012-07-25 09:01:17loewissetmessages: + msg166371
2012-07-25 08:11:49mark.dickinsonsetmessages: + msg166363
2012-07-25 04:52:25meador.ingesetnosy: - python-dev
messages: + msg166354

assignee: meador.inge
stage: patch review -> needs patch
2012-07-24 01:05:20meador.ingesetmessages: + msg166257
2012-07-23 22:29:48loewissetmessages: + msg166252
2012-07-23 22:04:39loewissetnosy: + loewis
messages: + msg166250
2012-07-23 16:29:22meador.ingesetmessages: + msg166235
2012-07-23 16:17:21python-devsetmessages: + msg166232
2012-07-23 16:16:25jceasetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg166231

stage: resolved -> patch review
2012-07-23 15:26:48meador.ingesetstatus: open -> closed
resolution: fixed
messages: + msg166228

stage: patch review -> resolved
2012-07-23 15:25:45python-devsetnosy: + python-dev
messages: + msg166227
2012-07-23 15:20:45meador.ingesetmessages: + msg166226
2012-07-22 11:29:41asvetlovsetnosy: + asvetlov
messages: + msg166125
2012-07-20 16:56:24gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg165946
2012-07-20 16:38:22jceasetmessages: + msg165945
2012-07-20 15:13:05meador.ingesetmessages: + msg165936
stage: patch review
2012-07-20 10:40:55jceasetnosy: + jcea
2012-07-20 06:49:26serhiy.storchakacreate