classification
Title: sys.getsizeof() gives an AttributeError for _sre objects.
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.0, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: schuppenies Nosy List: amaury.forgeotdarc, benjamin.peterson, facundobatista, schuppenies
Priority: normal Keywords: patch

Created on 2008-06-16 10:51 by schuppenies, last changed 2008-07-16 07:16 by schuppenies. This issue is now closed.

Files
File name Uploaded Description Edit
_sre_sizeof.patch schuppenies, 2008-06-16 10:51 Patch against 2.6 trunk, revision 64299
sizeof.patch amaury.forgeotdarc, 2008-06-26 22:08
sizeof2.patch schuppenies, 2008-07-03 13:31
Messages (15)
msg68266 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-16 10:51
>>> import re
>>> import sys
>>> r = re.compile('')
>>> sys.getsizeof(r)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: __sizeof__

This applies to objects of the types _sre.SRE_Pattern,
_sre.SRE_Scanner, and _sre.SRE_Match.

The attached patch addresses this issue.
msg68788 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2008-06-26 15:47
Robert, do you have a test suite for the sizeof functionality?  If not,
you should start one ASAP, ;)

This test should be included in that suit...
msg68789 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-26 16:01
What would be a good way to identify *all* possible types? 
When I started, I included all objects in /Objects, but obviously this
is not sufficient.
msg68801 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-06-26 21:10
I think it would be better to give a TypeError rather than an
AttributeError for objects that don't support __sizeof__ as per other
special methods.
msg68802 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-26 21:40
The attribute error is caused by pattern_getattr, which tries to find
__sizeof__, fails and then sets the error message. I don't know if
casting the error is the right thing to do. Actually, sys.getsizeof()
should work on any type.

Another suggestion was a that sys.getsizeof allows one optional
argument for a default size. If the default argument is provided,
sys.getsizeof will not throw an exception (if the __sizeof__ method is
missing or for any other error), but instead return the given default
size.
Still, an agreement on the right error message is needed.
msg68804 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-06-26 22:08
I wondered why getsizeof fails for _sre.SRE_Pattern objects when it
succeeds for _socket.socket or struct.Struct.

It turns out that _sre.SRE_Pattern defines the tp_getattr slot, and this
prevents attribute lookup from searching the base class (object).
This is a pity, because the base implementation (which use tp_basicsize)
does the right thing for many objects.

So I borrowed some code from the __format__ method, and here is a patch.
Now classes are not handled specially any more; the distinction is
between old-style instances, and all other objects.

All tests pass, but I may have missed something important...
msg68805 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-06-26 22:24
Robert,
looking at your patch for the _sre module, I noticed that
MatchObject.__sizeof__ includes the sizeof of one of its member (regs: a
tuple of (begin, end) pairs).
Why this one and not the others? I thought the rule (if there is a rule)
was to include the used memory for members that are *not* python objects.
msg68818 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-27 07:12
You are right, the rule is to not include referenced objects.
But, and this has been rather informal up to now, I handled transparently
cached information as something that is added to the memory used by an
object (see unicode.defenc). The same I applied to
MatchObject.regs. The rational being that the user cannot know wether
the match positions are cached or not.

What do you think?
msg68819 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-06-27 07:52
I don't understand the relation between "the member is cached" and "it
counts in the object's sizeof". 
What does "cached" mean? 
Does 'self.x = 3' create a cached member?
msg68820 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-27 09:05
I was refering to the fact that something on the C level is cached
without the Python user ever noticing that it wasn't there before.
Caching itself is no criteria, but allocating memory without giving the
user a chance to find out should be (in this context).
Maybe I am missing something here, but calling match.regs creates a
tuple which is not there before, but cannot be removed
afterwards. This is why I handled it separately.
msg68821 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-06-27 09:49
> Caching itself is no criteria, but allocating memory without giving
> the user a chance to find out should be (in this context).
> ... calling match.regs creates a
> tuple which is not there before, but cannot be removed
> afterwards. This is why I handled it separately.

Well, so why only include the tuple, and not objects inside the tuple?
They may also count in allocated memory (not often: small numbers are
shared)

Does the same criteria apply to function.func_defaults and function.doc
members? Both can be None, sizeof(None) would be added twice.

Would you say the same for property members?

class C(object):
   def setx(self): self.__x = 42
   x = property(lambda self: self.__x)

the value is not there before you call o.setx(), and cannot be removed
afterwards.

IMO, the criteria (to decide whether a container should include a
particular PyObject member in its sizeof) should not include the way the
member  is created, or who created it, but only the current layout in
memory. For example: can other objects hold references to this member,
does it appear in gc.objects...

And I propose this simple algorithm: do not include any referenced
PyObject :-)
msg68826 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-27 12:22
Okay, I get the point. With including unicode.defenc I already
included a referenced object which was ruled out in the first
place. And this for a good reason.

What bugs me, though, is that this leaves out a potentially
significant amount of memory. I know that this is already the case
for shared objects (e.g. the above mentioned numbers) or malloc
overhead, but adding yet another exception bothers me.
On the other hand, since it's hidden behind the C API, I don't know
how to address this problem. Maybe just give it some text in the
documentation is sufficient.
msg69148 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-07-02 23:47
It now works for SRE objects in the py3k branch since r64672, but my
patch is still needed for types that define tp_getattr.
msg69199 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-07-03 13:31
Amaury,
I was testing your patch and it turns out, that it will ignore
any __sizeof__ attribute which may be available through getattr. I
adapted it a bit, so now getsizeof will try to call the method on the
passed object first, and if it fails or the object is a type, the code
proposed by you will be executed. This also deals with old-style class
instances. The match_sizeof function in the patch is just to showcase
the example. What do you think?
msg69767 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-07-16 07:16
Fixed in r64842.
History
Date User Action Args
2008-07-16 07:16:12schuppeniessetstatus: open -> closed
resolution: fixed
messages: + msg69767
keywords: patch, patch
2008-07-03 13:31:48schuppeniessetkeywords: patch, patch
files: + sizeof2.patch
messages: + msg69199
2008-07-02 23:47:41amaury.forgeotdarcsetkeywords: patch, patch
messages: + msg69148
2008-06-27 12:22:09schuppeniessetkeywords: patch, patch
messages: + msg68826
2008-06-27 09:49:01amaury.forgeotdarcsetkeywords: patch, patch
messages: + msg68821
2008-06-27 09:05:38schuppeniessetkeywords: patch, patch
messages: + msg68820
2008-06-27 07:52:24amaury.forgeotdarcsetkeywords: patch, patch
messages: + msg68819
2008-06-27 07:12:14schuppeniessetkeywords: patch, patch
messages: + msg68818
2008-06-26 22:24:29amaury.forgeotdarcsetkeywords: patch, patch
messages: + msg68805
2008-06-26 22:08:16amaury.forgeotdarcsetkeywords: patch, patch
files: + sizeof.patch
messages: + msg68804
nosy: + amaury.forgeotdarc
2008-06-26 21:40:52schuppeniessetkeywords: patch, patch
messages: + msg68802
2008-06-26 21:10:33benjamin.petersonsetkeywords: patch, patch
nosy: + benjamin.peterson
messages: + msg68801
2008-06-26 16:01:55schuppeniessetkeywords: patch, patch
messages: + msg68789
2008-06-26 15:47:08facundobatistasetkeywords: patch, patch
nosy: + facundobatista
messages: + msg68788
2008-06-16 10:51:25schuppeniescreate